chapter 14, sections 1–4 · c stuart russel and peter norvig, 2004 chapter 14, sections 1–4 1....

22
Bayesian networks Chapter 14, Sections 1–4 Artificial Intelligence, spring 2013, Peter Ljungl¨ of; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1

Upload: ngodat

Post on 07-Jul-2018

239 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Bayesian networks

Chapter 14, Sections 1–4

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1

Page 2: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Bayesian networks

A simple, graphical notation for conditional independence assertionsand hence for compact specification of full joint distributions

Syntax:– a set of nodes, one per variable– a directed, acyclic graph (link ≈ “directly influences”)– a conditional distribution for each node given its parents:

P(Xi|Parents(Xi))

In the simplest case, the conditional distribution is representedas a conditional probability table (CPT) giving thedistribution over Xi for each combination of parent values

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 2

Page 3: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example

The topology of a network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 3

Page 4: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example

I’m at work. My neighbor John calls to say my alarm is ringing, but myneighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Isthere a burglar?

Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

The network topology reflects our “causal” knowledge:– a burglar can trigger the alarm– an earthquake can trigger the alarm– the alarm can cause Mary to call– the alarm can cause John to call

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 4

Page 5: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example contd.

.001

P(B)

.002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

B

TTFF

E

TFTF

.95

.29

.001

.94

P(A|B,E)

A

TF

.90

.05

P(J|A) A

TF

.70

.01

P(M|A)

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 5

Page 6: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Compactness

A CPT for BooleanXi with k Boolean parents hasB E

J

A

M

2k rows for the combinations of parent values

Each row requires one number p for Xi= true(the number for Xi= false is just 1− p)

If each variable has no more than k parents,the complete network requires O(n · 2k) numbers

I.e., it grows linearly with n, vs. O(2n) for the full joint distribution

For the burglary net, 1 + 1 + 4 + 2 + 2= 10 numbers (vs. 25 − 1 = 31)

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 6

Page 7: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Global semantics

The global semantics defines the full joint distributionB E

J

A

M

as the product of the local conditional distributions:

P (x1, . . . , xn) = Πni=1

P (xi|parents(Xi))

e.g., P (j ∧m ∧ a ∧ ¬b ∧ ¬e)

= P (j|a)P (m|a)P (a|¬b,¬e)P (¬b)P (¬e)

= 0.9× 0.7× 0.001× 0.999× 0.998

≈ 0.00063

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 7

Page 8: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Markov blanket

Theorem: Each node is conditionally independent of all others given itsMarkov blanket: parents + children + children’s parents

. . .

. . .U1

X

Um

Yn

Znj

Y1

Z1j

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 8

Page 9: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Constructing Bayesian networks

We need a method such that a series of locally testable assertions ofconditional independence guarantees the required global semantics

1. Choose an ordering of variables X1, . . . , Xn

2. For i = 1 to nadd Xi to the networkselect parents from X1, . . . , Xi−1 such that

P(Xi|Parents(Xi)) = P(Xi|X1, . . . , Xi−1)

This choice of parents guarantees the global semantics:

P(X1, . . . , Xn) = Πni=1

P(Xi|X1, . . . , Xi−1) (chain rule)

= Πni=1

P(Xi|Parents(Xi)) (by construction)

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 9

Page 10: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example

Suppose we choose the ordering M , J , A, B, E

MaryCalls

JohnCalls

P (J |M) = P (J)?

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 10

Page 11: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example

Suppose we choose the ordering M , J , A, B, E

MaryCalls

Alarm

JohnCalls

P (J |M) = P (J)? NoP (A|J,M) = P (A|J)? P (A|J,M) = P (A)?

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 11

Page 12: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example

Suppose we choose the ordering M , J , A, B, E

MaryCalls

Alarm

Burglary

JohnCalls

P (J |M) = P (J)? NoP (A|J,M) = P (A|J)? P (A|J,M) = P (A)? NoP (B|A, J,M) = P (B|A)?P (B|A, J,M) = P (B)?

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 12

Page 13: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example

Suppose we choose the ordering M , J , A, B, E

MaryCalls

Alarm

Burglary

Earthquake

JohnCalls

P (J |M) = P (J)? NoP (A|J,M) = P (A|J)? P (A|J,M) = P (A)? NoP (B|A, J,M) = P (B|A)? YesP (B|A, J,M) = P (B)? NoP (E|B,A, J,M) = P (E|A)?P (E|B,A, J,M) = P (E|A,B)?

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 13

Page 14: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example

Suppose we choose the ordering M , J , A, B, E

MaryCalls

Alarm

Burglary

Earthquake

JohnCalls

P (J |M) = P (J)? NoP (A|J,M) = P (A|J)? P (A|J,M) = P (A)? NoP (B|A, J,M) = P (B|A)? YesP (B|A, J,M) = P (B)? NoP (E|B,A, J,M) = P (E|A)? NoP (E|B,A, J,M) = P (E|A,B)? Yes

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 14

Page 15: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example contd.

MaryCalls

Alarm

Burglary

Earthquake

JohnCalls

Network is less compact: 1 + 2 + 4 + 2 + 4= 13 numbers needed

Compare with the original burglary net: 1 + 1 + 4 + 2 + 2= 10 numbers

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 15

Page 16: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Example contd.

The chosen ordering of the variables can have a big impact on the size ofthe network! Network (b) has 25 − 1 = 31 numbers—exactly the same asthe full joint distribution

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 16

Page 17: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Inference tasks

Simple queries: compute posterior marginal P(Xi|E= e)e.g., P (Burglar|JohnCalls = true,MaryCalls = true)or shorter, P (B|j,m)

Conjunctive queries: P(Xi, Xj|E= e) = P(Xi|E= e)P(Xj|Xi,E= e)

Optimal decisions: decision networks include utility information;probabilistic inference required for P (outcome|action, evidence)

Value of information: which evidence to seek next?

Sensitivity analysis: which probability values are most critical?

Explanation: why do I need a new starter motor?

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 17

Page 18: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Inference by enumeration

Slightly intelligent way to sum out variables from the joint without actuallyconstructing its explicit representation

Simple query on the burglary network:B E

J

A

M

P(B|j,m)= P(B, j,m)/P (j,m)= αP(B, j,m)= α Σe Σa P(B, e, a, j,m)

(where e and a are the hidden variables)

Rewrite full joint entries using product of CPT entries:P(B|j,m)

= α Σe Σa P(B)P (e)P(a|B, e)P (j|a)P (m|a)= α P(B) Σe P (e) Σa P(a|B, e)P (j|a)P (m|a)

Recursive depth-first enumeration: O(n) space, O(dn) time

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 18

Page 19: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Evaluation tree

P(j|a).90

P(m|a).70 .01

P(m| a)

.05P(j| a) P(j|a)

.90

P(m|a).70 .01

P(m| a)

.05P(j| a)

P(b).001

P(e).002

P( e).998

P(a|b,e).95 .06

P( a|b, e).05P( a|b,e)

.94P(a|b, e)

Enumeration is inefficient: repeated computatione.g., computes P (j|a)P (m|a) for each value of e

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 19

Page 20: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Inference by variable elimination

Variable elimination: carry out summations right-to-left,storing intermediate results (factors) to avoid recomputation

P(B|j,m) = α P(B) Σe P (e) Σa P(a|B, e) P (j|a) P (m|a)

= α f1(B) Σe f2(E) Σa f3(A,B,E) f4(A) f5(A)

(where f1, f2, f4, f5, are 2-element vectors, and f3 is a 2× 2× 2 matrix)

Sum out A to get the 2× 2 matrix f6, and then E to get the 2-vector f7:

f6(B,E) = Σa f3(A,B,E)× f4(A)× f5(A)

= f3(a,B,E)× f4(a)× f5(a) + f3(¬a,B,E)× f4(¬a)× f5(¬a)

f7(B) = Σe f2(E)× f6(B,E) = f2(e)× f6(B, e) + f2(¬e)× f6(B,¬e)

Finally, we get this:

P(B|j,m) = α f1(B)× f7(B)

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 20

Page 21: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Irrelevant variables

Consider the query P (JohnCalls|Burglary= true)B E

J

A

M

P (J |b) = αP (b)∑

eP (e)

aP (a|b, e)P (J |a)

mP (m|a)

Sum over m is identically 1; M is irrelevant to the query

Theorem: Y is irrelevant unless Y ∈Ancestors({X}∪E)

Here, X = JohnCalls, E= {Burglary}, andAncestors({X}∪E) = {Alarm,Earthquake}so MaryCalls is irrelevant

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 21

Page 22: Chapter 14, Sections 1–4 · c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1. Bayesian networks A simple, graphical notation for conditional independence assertions

Summary

Bayes nets provide a natural representation for (causally induced)conditional independence

Topology + CPTs = compact representation of joint distribution

Generally easy for (non)experts to construct

Probabilistic inference tasks can be computed exactly:– variable elimination avoids recomputations– irrelevant variables can be removed, which reduces complexity

Artificial Intelligence, spring 2013, Peter Ljunglof; based on AIMA Slides c©Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 22