inference algorithms for bayes networks. outline bayes nets are popular representations in ai, and...

45
Inference Algorithms for Bayes Networks

Upload: maximillian-green

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Inference Algorithms for Bayes Networks

Page 2: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Outline

Bayes Nets are popular representations in AI, and researchers have developed many inference techniques for them.

We will consider two types of algorithms:1) Exact inference (with 2 subtypes)– Enumeration– Variable elimination– Other techniques not covered: Junction tree, loop-set conditioning, …

2) Approximate inference (sampling) (with 3 sub-types)– Rejection sampling– Likelihood weighting– Gibbs sampling

Page 3: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

First: Notation

I’m going to assume all variables are binary.

For random variable A, I will write the event that A is true as +a, and –a for A is false.

Similarly for the other variables.

C

A B

D E

Page 4: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Technique 1: EnumerationThis is the “brute-force” approach to BN inference.

C

A B

D E

Suppose I want to know P(+a | +b, +e).Algorithm:1) If query is conditional (yes in this case),

rewrite with def. of cond. prob.

2) Use marginalization to rewrite marginal probabilities in terms of the joint probability. e.g.,

3) Use the Bayes Net equation to determine the joint probability.

𝑃 (+𝑎|+𝑏 ,+𝑒)=𝑃 (+𝑎 ,+𝑏 ,+𝑒)𝑃 (+𝑏 ,+𝑒)

𝑃 (+𝑎 ,+𝑏 ,+𝑒)=∑𝑐∑𝑑

𝑃 (+𝑎 ,+𝑏 ,𝑐 ,𝑑 ,+𝑒)

Page 5: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Speeding up Enumeration

∑𝑐∑𝑑

𝑃 (+𝑎 )𝑃 (+𝑏) 𝑃 (𝑐|+𝑎 ,+𝑏)𝑃 (𝑑|𝑐 ) 𝑃 (+𝑒∨𝑐)

¿𝑃 (+𝑎 ) 𝑃 (+𝑏 )∑𝑐∑𝑑

𝑃 (𝑐|+𝑎 ,+𝑏)𝑃 (𝑑|𝑐 ) 𝑃 (+𝑒∨𝑐)

¿𝑃 (+𝑎 ) 𝑃 (+𝑏 )∑𝑐

𝑃 (𝑐|+𝑎 ,+𝑏 )𝑃 (+𝑒∨𝑐)∑𝑑

𝑃 (𝑑|𝑐 )

Pulling out terms:

Each term in the sum is faster.But: the total number of terms (things to add up) remains the same.In the worst case, this is still exponential in the number of nodes.

Page 6: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Maximize Independence

If you can, it helps to create the BN so that it has as few edges as possible.

Alarm

Burglary Earthquake

John calls Mary calls

Let’s re-create the network on the left, but start with the “John Calls” node and gradually add more nodes and edges.

Let’s see how many edges/dependencies we end up with.

Page 7: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Maximize Independence

If you can, it helps to create the BN so that it has as few edges as possible.

Alarm

Burglary Earthquake

John calls Mary calls

John calls Mary calls?

Page 8: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Maximize Independence

If you can, it helps to create the BN so that it has as few edges as possible.

Alarm

Burglary Earthquake

John calls Mary calls

John calls Mary calls

Alarm

? ?

Page 9: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Maximize Independence

If you can, it helps to create the BN so that it has as few edges as possible.

Alarm

Burglary Earthquake

John calls Mary calls

John calls Mary calls

Alarm

Burglary

??

?

Page 10: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Maximize Independence

If you can, it helps to create the BN so that it has as few edges as possible.

Alarm

Burglary Earthquake

John calls Mary calls

John calls Mary calls

Alarm

Burglary Earthquake

?

??

?

Page 11: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Maximize Independence

If you can, it helps to create the BN so that it has as few edges as possible.

Alarm

Burglary Earthquake

John calls Mary calls

John calls Mary calls

Alarm

Burglary Earthquake

Page 12: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Causal Direction

Moral: Bayes Nets tend to be the most compact, and most efficient, when edges go from causes to effects.

Alarm

Burglary Earthquake

John calls Mary calls

John calls Mary calls

Alarm

Burglary Earthquake

Causal direction Non-causal direction

Page 13: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Technique 2: Variable Elimination

C

A B

D E

Suppose I want to know P(+a | +b, +e).Algorithm:1) If query is conditional (yes in this case),

rewrite with def. of cond. prob.

2) For each marginal distribution, apply variable elimination to find that probability. e.g., for

a. Join C & D (multiplication)b. Eliminate D (marginalization)c. Join C & +e (multiplication)d. Eliminate C (marginalization)e. Join +a & +e (multiplication)f. Join +b & (+a, +e) (multiplication)g. Done.

𝑃 (+𝑎|+𝑏 ,+𝑒)=𝑃 (+𝑎 ,+𝑏 ,+𝑒)𝑃 (+𝑏 ,+𝑒)

𝑃 (+𝑎 ,+𝑏 ,+𝑒)

Page 14: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Joining D & C

C

A B

D E

Bayes Net provides: P(C | +a, +b) P(D | C)

Joining D & C will compute P(D, C | +a, +b)

C, D

A B

E

For each c and each d, compute: P(d, c | +a, +b) = P(d | c) * P(c | +a, +b)

Page 15: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Eliminating D

Bayes Net now provides: P(D, C | +a, +b) Eliminating D will compute P(C | +a, +b)

C

A B

E

For each c, compute: P(c | +a, +b) = d P(d, c | +a, +b)

C, D

A B

E

Page 16: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Joining C and +e

Bayes Net now provides: P(C | +a, +b) P(+e | C) Joining C and +e will compute P(+e, C | +a, +b)

C, E

A B

For each c, compute: P(+e, c | +a, +b) = P(c | +a, +b)*P(+e | c)

C

A B

E

Page 17: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Eliminating C

Bayes Net now provides: P(+e, C | +a, +b) Eliminating C will compute P(+e | +a, +b)

E

A B

Compute: P(+e | +a, +b) = c P(+e, c | +a, +b)

C, E

A B

Page 18: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Joining +a, +b, and +e

Bayes Net now provides: P(+e | +a, +b) P(+a), P(+b) Joining +a, +b, and +e will compute P(+e, +a, +b)

A, B, E

Compute: P(+e, +a , +b) = P(+e | +a, +b) * P(a) * P(b)

E

A B

Page 19: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Notes on Time ComplexityFor graphs that are trees with N nodes, variable elimination can perform inference in time O(N).

For general graphs, variable elimination can perform inference in time that O(2w), where w is the “tree-width” of the graph.(However, this depends on the order in which variables are eliminated, and it is hard to figure out the best order.)

Intuitively, tree-width is a measure of how close a graph is to an actual tree.

In the worst case, this can mean a time complexity that is exponential in the size of the graph.

Exact inference in BNs is known to be NP-hard.

Page 20: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Approximate Inference via Sampling

Penny Nickel Count Probability

Heads Heads 0 ?

Heads Tails 0 ?

Tails Heads 0 ?

Tails Tails 0 ?

Page 21: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Approximate Inference via Sampling

Penny Nickel Count Probability

Heads Heads

Heads Tails 1 1

Tails Heads

Tails Tails

Page 22: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Penny Nickel Count Probability

Heads Heads

Heads Tails 1 1

Tails Heads

Tails Tails

Approximate Inference via Sampling

Penny Nickel Count Probability

Heads Heads 1 .5

Heads Tails 1 .5

Tails Heads 0

Tails Tails 0

Page 23: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Penny Nickel Count Probability

Heads Heads 1 .5

Heads Tails 1 .5

Tails Heads 0

Tails Tails 0

Approximate Inference via Sampling

Penny Nickel Count Probability

Heads Heads 2 .67

Heads Tails 1 .33

Tails Heads 0

Tails Tails 0

Page 24: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Approximate Inference via Sampling

Penny Nickel Count Probability

Heads Heads 53 .2465

Heads Tails 56 .2605

Tails Heads 52 .2419

Tails Tails 54 .2512

As the number of samples increases, our estimates should approach the true joint distribution.

Conveniently, we get to decide how long we want to spend to figure out the probabilities.

Page 25: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Generating Samples from a BN

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:

For each variable X that has not been assigned, but whose parents have all been assigned:

1. r a random number in the range [0, 1]2. If r < P(+x | parents(X)), then assign X +x3. Else, X -x

For this example:

At first, A is the only variable whose parents have been assigned (since it has no parents).

r 0.30.3 < P(+a), so we assign A +a

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 26: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Generating Samples from a BN

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:

For each variable X that has not been assigned, but whose parents have all been assigned:

1. r a random number in the range [0, 1]2. If r < P(+x | parents(X)), then assign X +x3. Else, X -x

For this example: Current Sample: +a

Next, both B and C have all their parents assigned. Let’s choose B.

r .9.9 >= P(+b | +a), so we set B -b

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 27: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Generating Samples from a BN

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:

For each variable X that has not been assigned, but whose parents have all been assigned:

1. r a random number in the range [0, 1]2. If r < P(+x | parents(X)), then assign X +x3. Else, X -x

For this example: Current Sample: +a, -b

Quiz: what variable would be assigned next?

If r .4, what would this variable be assigned?

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 28: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Generating Samples from a BN

A

B C

D

A P(A)

+a .6

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:

For each variable X that has not been assigned, but whose parents have all been assigned:

1. r a random number in the range [0, 1]2. If r < P(+x | parents(X)), then assign X +x3. Else, X -x

For this example: Current Sample: +a, -b, -c

Now D has all its parents assigned.

If r .2, what would D be assigned?

Page 29: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Generating Samples from a BN

A

B C

D

A P(A)

+a .6

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:

For each variable X that has not been assigned, but whose parents have all been assigned:

1. r a random number in the range [0, 1]2. If r < P(+x | parents(X)), then assign X +x3. Else, X -x

For this example: Current Sample: +a, -b, -c, +d

That completes this sample.

We can now increase the count of (+a, -b, -c, +d) by 1,and move on to the next sample.

Page 30: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Quiz: Approximating Queries

Suppose I generate a bunch of samples for a BN with variables A, B, C, and get these counts.

What are these probabilities?P(+a, -b, -c)?

P(+a, -c)?

P(-a | -b, -c)?

P(-b | +a)?

A B C Count

+a +b +c 20

+a +b -c 30

+a -b +c 50

+a -b -c 30

-a +b +c 30

-a +b -c 20

-a -b +c 80

-a -b -c 40

Total 300

Page 31: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Technique 3: Rejection Sampling

Rejection sampling is the fancy name given to the procedure you just used to compute, eg., P(-a | -b, -c).

To compute this, you ignore (or “reject”) samples where B = +b or C = +c, since they don’t match the evidence in the query.

Page 32: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Consistency

Rejection sampling is a consistent approximate inference technique.

Consistency means that as the number of samples increases, the estimated value of the probability for a query approaches its true value.

In the limit of infinite samples, consistent sampling techniques give the correct probabilities.

Page 33: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Room for Improvement

Efficiency of Rejection Sampling: If you’re interested in a query like P(+a | +b, +c), you’ll reject 5 out of 6 samples, since only 1 out of 6 samples have the right evidence (+b and +c).

So most samples are useless for your query.

Page 34: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Technique 4: Likelihood Weighting

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:Initialize: sample {}, P(sample) 1For each variable X that has not been assigned, but whose parents have all been assigned:

1. If X is an evidence node:a. assign X the value from the queryb. P(sample) P(sample) * P(X|parents(X))

2. Otherwise, assign X as normal, P(sample) unchangedFor this example: Sample: {} P(sample): 1

At first, A is the only variable whose parents have been assigned (since it has no parents).

r 0.30.3 < P(+a), so we assign A +a

Query of interest: P(+c | +b, +d)

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 35: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Likelihood Weighting

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:Initialize: sample {}, P(sample) 1For each variable X that has not been assigned, but whose parents have all been assigned:

1. If X is an evidence node:a. assign X the value from the queryb. P(sample) P(sample) * P(X|parents(X))

2. Otherwise, assign X as normal, P(sample) unchangedFor this example: Sample: {+a} P(sample): 1

B and C have their parents assigned. Let’s do B next.

B is an evidence node, so we choose B +b (from the query)Also, P(+b|+a) = .7, so we update P(sample) 0.7

Query of interest: P(+c | +b, +d)

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 36: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Likelihood Weighting

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:Initialize: sample {}, P(sample) 1For each variable X that has not been assigned, but whose parents have all been assigned:

1. If X is an evidence node:a. assign X the value from the queryb. P(sample) P(sample) * P(X|parents(X))

2. Otherwise, assign X as normal, P(sample) unchangedFor this example: Sample: {+a, +b} P(sample): 0.7

C has its parents assigned. It is NOT an evidence node.

r .8.8 >= P(+c | +a), so C -cP(sample) is NOT UPDATED.

Query of interest: P(+c | +b, +d)

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 37: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Likelihood Weighting

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:Initialize: sample {}, P(sample) 1For each variable X that has not been assigned, but whose parents have all been assigned:

1. If X is an evidence node:a. assign X the value from the queryb. P(sample) P(sample) * P(X|parents(X))

2. Otherwise, assign X as normal, P(sample) unchangedFor this example: Sample: {+a, +b, -c} P(sample): 0.7

D has its parents assigned.

How do the sample and P(sample) change?

Query of interest: P(+c | +b, +d)

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 38: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Likelihood Weighting

A

B C

D

A P(A)

+a .6

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:Initialize: sample {}, P(sample) 1For each variable X that has not been assigned, but whose parents have all been assigned:

1. If X is an evidence node:a. assign X the value from the queryb. P(sample) P(sample) * P(X|parents(X))

2. Otherwise, assign X as normal, P(sample) unchangedFor this example: Sample: {+a, +b, -c, +d} P(sample): 0.42

Query of interest: P(+c | +b, +d)

Page 39: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Likelihood Weighting vs. Rejection Sampling

A B C Count

+a +b +c 20

+a +b -c 30

+a -b +c 50

+a -b -c 30

-a +b +c 30

-a +b -c 20

-a -b +c 80

-a -b -c 40

Total 300

Rejection Sampling

A B C Probabilistic Count

-a +b +c 23.58

-a +b -c 68.3

-a -b +c 90.6

-a -b -c 40.6

Total 223.08

Likelihood Weightingfor query P(+c | -a)

Both are consistent.

Requires fewer samples to get good estimates.But solves just one query at a time.

Needs LOTS of samples.Can answer any query.

Page 40: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Further room for improvement

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Example query of interest: P(+d | +b, +c)

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

If we generate samples using likelihood weighting, the choice of sample for D takes into account the evidence.

However, the choice of sample for A does NOT take into account the evidence.

So we may generate lots of samples that are very unlikely, and don’t contribute much to our overall counts.

Quiz: what is P(+a | +b, +c)? And P(-a | +b, +c)?

Page 41: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Technique 5: Gibbs Sampling

Named after physicist Josiah Gibbs (you may have heard of Gibbs Free Energy).

This is a special case of a more general algorithm called Metropolis-Hastings, which is itself a special case of Markov-Chain Monte Carlo (MCMC) estimation.

Page 42: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Gibbs Sampling

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:Initialize: sample {Arandom, +b, -c, D random}Repeat:

1. pick a non-evidence variable X2. Get a random number r in the range [0, 1]3. If r < P(X | all other variables), set X +x4. Otherwise, set X -x5. Add 1 to the count for this new sample

For this example: Sample: {-a, +b, -c, +d}

A and D are non-evidence. Randomly choose D to re-set.

r 0.7P(+d | -a, +b, -c) = P(+d | +b, -c) = .6r >= .6, so D = -d

Query of interest: P(-d | +b, -c)

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 43: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Gibbs Sampling

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:Initialize: sample {Arandom, +b, -c, D random}Repeat:

1. pick a non-evidence variable X2. Get a random number r in the range [0, 1]3. If r < P(X | all other variables), set X +x4. Otherwise, set X -x5. Add 1 to the count for this new sample

For this example: Sample: {-a, +b, -c, -d}

A and D are non-evidence. Randomly choose D to re-set.

r 0.9P(+d | -a, +b, -c) = P(+d | +b, -c) = .6r >= .6, so D = -d (no change)

Query of interest: P(-d | +b, -c)

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 44: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Gibbs Sampling

A

B C

D

A P(A)

+a .6

A C P(C|A)

+a +c .4

-a +c .9

A B P(B|A)

+a +b .7

-a +b .6

Sample generation algorithm:Initialize: sample {Arandom, +b, -c, D random}Repeat:

1. pick a non-evidence variable X2. Get a random number r in the range [0, 1]3. If r < P(X | all other variables), set X +x4. Otherwise, set X -x5. Add 1 to the count for this new sample

For this example: Sample: {-a, +b, -c, -d}

A and D are non-evidence. Randomly choose A to re-set.

r 0.3P(+a | +b, -c, -d) = P(+a | +b, -c) = ?What is A after this step?

Query of interest: P(-d | +b, -c)

B C D P(D|B,C)

+b +c +d .5

+b -c +d .6

-b +c +d .2

-b -c +d .3

Page 45: Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques

Details of Gibbs Sampling

1. To compute P(X | all other variables), it is enough to consider only the Markov Blanket of X: – X’s parents, X’s children, and the parents of X’s children. – Everything else will be conditionally independent of X, given its Markov

Blanket.

2. Unlike Rejection Sampling and Likelihood Weighting, samples in Gibbs Sampling are NOT independent.

3. Nevertheless, Gibbs Sampling is consistent.

4. It is very common to discard the first N (often N ~= 1000) samples from a Gibbs sampler. The first N samples are called the “burn-in” period.