lectures 06 uncertainty
TRANSCRIPT
-
7/31/2019 Lectures 06 Uncertainty
1/154
Fahiem Bacchus, University of Toronto1
`Reasoning Under Uncertainty
` This material is covered in chapters 13 and 14.Chapter 13 gives some basic background on
probability from the point of view of AI. Chapter14 talks about Bayesian Networks, which will be amain topic for us.
CSC384h: Intro to Artificial Intelligence
-
7/31/2019 Lectures 06 Uncertainty
2/154
Uncertainty
` With First Order Logic we examined a mechanism forrepresenting true facts and for reasoning to newtrue facts.
` The emphasis on truth is sensible in some domains.
` But in many domain it is not sufficient to deal onlywith true facts. We have to gamble.
` E.g., we dont know for certain what the traffic willbe like on a trip to the airport.
2 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
3/154
Uncertainty
` But how do we gamble rationally?
` If we must arrive at the airport at 9pm on a week
night we could safely leave for the airport hourbefore.
` Some probability of the trip taking longer, but theprobability is low.
` If we must arrive at the airport at 4:30pm on Fridaywe most likely need 1 hour or more to get to theairport.
` Relatively high probability of it taking 1.5 hours.
Fahiem Bacchus, University of Toronto3
-
7/31/2019 Lectures 06 Uncertainty
4/154
Uncertainty
` To act rationally under uncertainty we must beable to evaluate how likely certain things are.
`
With FOL a fact F is only useful if it is known to be trueor false.
` But we need to be able to evaluate how likely it isthat F is true.
` By weighing likelihoods of events (probabilities)we can develop mechanisms for acting
rationally under uncertainty.
Fahiem Bacchus, University of Toronto4
-
7/31/2019 Lectures 06 Uncertainty
5/154
Dental Diagnosis example.
` In FOL we might formulate
` P. symptom(P,toothache)
disease(p,cavity)
disease(p,gumDisease)
disease(p,foodStuck) L
` When do we stop?
` Cannot list all possible causes.
` We also want to rank the possibilities. We dont want tostart drilling for a cavity before checking for more likelycauses first.
Fahiem Bacchus, University of Toronto5
-
7/31/2019 Lectures 06 Uncertainty
6/154
Probability (over Finite Sets)
` Probability is a function defined over a set ofevents U. Often called the universe of events.
` It assigns a value Pr(e) to each event e U, inthe range [0,1].
` It assigns a value to every set of events F bysumming the probabilities of the members ofthat set.
Pr(F) = e F Pr(e)
` Pr(U) = 1, i.e., sum over all events is 1.
` Therefore: Pr({}) = 0 andPr(A B) = Pr(A) + Pr(B) Pr(A B)
Fahiem Bacchus, University of Toronto6
-
7/31/2019 Lectures 06 Uncertainty
7/154
Probability in General
` Given a set U (universe), a probability functionis a function defined over the subsets of U that
maps each subset to the real numbers andthat satisfies the Axioms of Probability
1. Pr(U) = 1
2. Pr(A) [0,1]3. Pr(A B) = Pr(A) + Pr(B) Pr(A B)
Note if A B = {} then Pr(A B) = Pr(A) + Pr(B)
Fahiem Bacchus, University of Toronto7
-
7/31/2019 Lectures 06 Uncertainty
8/154
-
7/31/2019 Lectures 06 Uncertainty
9/154
-
7/31/2019 Lectures 06 Uncertainty
10/154
-
7/31/2019 Lectures 06 Uncertainty
11/154
Probability over Feature Vectors
` If we had Pr of every atomic event (fullinstantiation of the variables) we couldcompute the probability of any other set
(including sets can cannot be specified someset of variable values).
` E.g.` {V1 = a} = set of all events where V1 = a
`Pf({V1 = a}) =
x
2 Dom[V
2] x
3 Dom[V
3]L x
n Dom[V
n]
Pr(V1=a, V2=x2,V3=x3,,Vn=xn)
Fahiem Bacchus, University of Toronto11
-
7/31/2019 Lectures 06 Uncertainty
12/154
-
7/31/2019 Lectures 06 Uncertainty
13/154
Conditional Probabilities.
` In logic one has implication to expressconditional statements.
`
X. apple(X) goodToEat(X)` This assertion only provides useful information about
apples.
`
With probabilities one has access to a differentway of capturing conditional information:conditional probabilities.
` It turns out that conditional probabilities areessential for both representing and reasoningwith probabilistic information.
Fahiem Bacchus, University of Toronto13
-
7/31/2019 Lectures 06 Uncertainty
14/154
Conditional Probabilities
` Say that A is a set of events such thatPr(A) > 0.
` Then one can define a conditional probabilitywrt the event A:
Pr(B|A) = Pr(BA)/Pr(A)
Fahiem Bacchus, University of Toronto14
-
7/31/2019 Lectures 06 Uncertainty
15/154
Conditional Probabilities
Fahiem Bacchus, University of Toronto15
B
A
B coversabout 30%
of theentirespace, but
covers over80% of A.
-
7/31/2019 Lectures 06 Uncertainty
16/154
-
7/31/2019 Lectures 06 Uncertainty
17/154
Conditional Probabilities
Fahiem Bacchus, University of Toronto17
B
A
Bsprobability in
the newuniverse A is0.8.
-
7/31/2019 Lectures 06 Uncertainty
18/154
18niversit
` A conditional probability is a probabilityfunction, but now over A instead of over the
entire space.
` Pr(A| A) = 1
` Pr(B| A) [0,1]` Pr(C B| A) = Pr(C| A) + Pr(B| A) Pr(C B| A)
Conditional Probabilities
Fahiem Bacchus, U y of Toronto
-
7/31/2019 Lectures 06 Uncertainty
19/154
19niversit
Properties and Sets
` In First Order Logic, properties like tall(X), areinterpreted as a set of individuals (the individualsthat are tall).
` Similarly any set of events A can be interpreted as aproperty: the set of events with property A.
` When we specify big(X)tall(X) in FOL, we interpretthis as the set of individuals that lie in the intersectionof the sets big(X) and tall(X).
Fahiem Bacchus, U y of Toronto
-
7/31/2019 Lectures 06 Uncertainty
20/154
-
7/31/2019 Lectures 06 Uncertainty
21/154
Independence
` It could be that the density of B on A isidentical to its density on the entire set.
`Density: pick an element at random from the entire set.How likely is it that the picked element is in the set B?
` Alternately the density of B on A could bemuch different that its density on the wholespace.
` In the first case we say that B is independent ofA. While in the second case B is dependent onA.
Fahiem Bacchus, University of Toronto21
-
7/31/2019 Lectures 06 Uncertainty
22/154
Independence
Fahiem Bacchus, University of Toronto22
Density
of B
A A
Independent Dependent
-
7/31/2019 Lectures 06 Uncertainty
23/154
Independence Definition
A and B are independent properties
Pr(B|A) = Pr(B)
A and B are dependent.
Pr(B|A) Pr(B)
Fahiem Bacchus, University of Toronto23
-
7/31/2019 Lectures 06 Uncertainty
24/154
Implications for Knowledge
` Say that we have picked an element from theentire set. Then we find out that this elementhas property A (i.e., is a member of the set A).
` Does this tell us anything more about how likely it isthat the element also has property B?
` If B is independent of A then we have learnednothing new about the likelihood of the elementbeing a member of B.
Fahiem Bacchus, University of Toronto24
-
7/31/2019 Lectures 06 Uncertainty
25/154
-
7/31/2019 Lectures 06 Uncertainty
26/154
Conditional Independence
` Say we have already learned that the randomlypicked element has property A.
` We want to know whether or not the element has
property B:Pr(B|A) expresses the
probability of this beingtrue.
` Now we learn that the element also has propertyC. Does this give us more information about B-ness?
Pr(B|AC) expresses the probabilityof this being true under theadditional information.
Fahiem Bacchus, University of Toronto26
-
7/31/2019 Lectures 06 Uncertainty
27/154
Conditional Independence` If
Pr(B|AC) = Pr(B|A)
then we have not gained any additional information from
knowing that the element is also a member of the set C.
` In this case we say that B is conditionally independent of Cgiven A.
` That is, once we know A, additionally knowing C is irrelevant(wrt to whether or not B is true).
`
Conditional independence is independence in the conditionalprobability space Pr(|A).
` Note we could have Pr(B|C) Pr(B). But once we learn A, Cbecomes irrelevant.
Fahiem Bacchus, University of Toronto27
Computational Impact of
-
7/31/2019 Lectures 06 Uncertainty
28/154
Computational Impact ofIndependence
` We will see in more detail how independenceallows us to speed up computation. But thefundamental insight is that
If A and B are independent properties then
Pr(AB) = Pr(B) * Pr(A)
Proof:Pr(B|A) = Pr(B) independencePr(AB)/Pr(A) = Pr(B) definitionPr(AB) = Pr(B) * Pr(A)
Fahiem Bacchus, University of Toronto28
-
7/31/2019 Lectures 06 Uncertainty
29/154
Computational Impact of
-
7/31/2019 Lectures 06 Uncertainty
30/154
Computational Impact ofIndependence
` Similarly for conditional independence.Pr(B|CA) = Pr(B|A)
Pr(BC|A) = Pr(B|A) * Pr(C|A)
Proof:Pr(B|CA) = Pr(B|A)
independence
Pr(BCA)/Pr(CA) = Pr(BA)/Pr(A) defn.Pr(BCA)/Pr(A) = Pr(CA)/Pr(A) * Pr(BA)/Pr(A)Pr(BC|A) = Pr(B|A) * Pr(C|A) defn.
Fahiem Bacchus, University of Toronto30
Computational Impact of
-
7/31/2019 Lectures 06 Uncertainty
31/154
Computational Impact ofIndependence
` Conditional independence allows us to breakup our computation onto distinct parts
Pr(BC|A) = Pr(B|A) * Pr(C|A)
` And it also allows us to ignore certain pieces of
information
Pr(B|AC) = Pr(B|A)
Fahiem Bacchus, University of Toronto31
-
7/31/2019 Lectures 06 Uncertainty
32/154
Bayes Rule
` Bayes rule is a simple mathematical fact. But ithas great implications wrt how probabilities canbe reasoned with.
` Pr(Y|X) = Pr(X|Y)Pr(Y)/Pr(X)
Pr(Y|X) = Pr(YX)/Pr(X)= Pr(YX)/Pr(X) * P(Y)/P(Y)= Pr(YX)/Pr(Y) * Pr(Y)/Pr(X)
= Pr(X|Y)Pr(Y)/Pr(X)
Fahiem Bacchus, University of Toronto32
-
7/31/2019 Lectures 06 Uncertainty
33/154
Bayes Rule` Bayes rule allows us to use a supplied conditional probability in
both directions.
`
E.g., from treating patients with heart disease we might beable to estimate the value of
Pr( high_Cholesterol | heart_disease)
` With Bayes rule we can turn this around into a predictor forheart disease
Pr(heart_disease | high_Cholesterol)
` Now with a simple blood test we can determinehigh_Cholesterol use this to help estimate the likelihood ofheart disease.
Fahiem Bacchus, University of Toronto33
-
7/31/2019 Lectures 06 Uncertainty
34/154
Bayes Rule
` For this to work we have to deal with the otherfactors as well
Pr(heart_disease | high_Cholesterol)= Pr(high_Cholesterol | heart_disease)
* Pr(heart_disease)/Pr(high_Cholesterol)
` We will return to this later.
Fahiem Bacchus, University of Toronto34
-
7/31/2019 Lectures 06 Uncertainty
35/154
Bayes Rule Example
` Disease {malaria, cold, flu}; Symptom = fever
` Must compute Pr(D | fever) to prescribe treatment
`
Why not assess this quantity directly?` Pr(mal | fever) is not natural to assess;
Pr(fever | mal) reflects the underlying causalmechanism
` Pr(mal | fever) is not stable: a malaria epidemicchanges this quantity (for example)
` So we use Bayes rule:` Pr(mal | fever) = Pr(fever | mal) Pr(mal) / Pr(fever)
Fahiem Bacchus, University of Toronto35
-
7/31/2019 Lectures 06 Uncertainty
36/154
Bayes Rule
` Pr(mal | fever) = Pr(fever | mal)Pr(mal)/Pr(fever)
`
What about Pr(mal)?` This is the prior probability of Malaria, i.e., before you
exhibited a fever, and with it we can account forother factors, e.g., a malaria epidemic, or recenttravel to a malaria risk zone.
Fahiem Bacchus, University of Toronto36
-
7/31/2019 Lectures 06 Uncertainty
37/154
Bayes Rule
` Pr(mal | fever) = Pr(fever | mal)Pr(mal)/Pr(fever)
` What about Pr(fever)?`
We computePr(fever|mal)Pr(mal),Pr(fever|cold)Pr(cold).Pr(fever|flu)Pr(flu).
` I.e., solve the same problem for each possible clause of
fever.
` Pr(fever|mal)Pr(mal)= Pr(fevermal)/Pr(mal) * Pr(mal)
= Pr(fevermal)
` Similarly we can obtainPr(fevercold)Pr(feverflu)
Fahiem Bacchus, University of Toronto37
-
7/31/2019 Lectures 06 Uncertainty
38/154
Bayes Rule
` Say that in our example, m, c and fl are only possible causesof fever (Pr(fev| m c fl) = 0) and they are mutually
exclusive. Then
Pr(fev) = Pr(m&fev) + Pr(c&fev) + Pr(fl&fev)
` So we can sum the numbers we obtained for each diseaseto obtain the last factor in Bayes Rule: Pr(fever).
` Then we divide each answer by Pr(fever) to get the finalprobabilities Pr(mal|fever), Pr(cold|fever), Pr(flu|fever).
` This summing and dividing by the sum ensures thatPr(mal|fever) + Pr(cold|fever) + Pr(flu|fever) = 1, and iscalled normalizing.
Fahiem Bacchus, University of Toronto38
-
7/31/2019 Lectures 06 Uncertainty
39/154
Chain Rule
` Pr(A1A2An) =Pr(A1| A2An) * Pr(A2| A3 An)*L * Pr(An-1| An) * Pr(An)
Proof:
Pr(A1| A
2A
n) * Pr(A
2| A
3A
n)
*L * Pr(An-1| An)= Pr(A1A2An)/ Pr(A2An) *
Pr(A2A
n)/ Pr(A
3A
n) *L *
Pr(An-1An)/Pr(An) * Pr(An)
Fahiem Bacchus, University of Toronto39
-
7/31/2019 Lectures 06 Uncertainty
40/154
Variable Independence
` Recall that we will be mainly dealing withprobabilities over feature vectors.
`
We have a set of variables, each with a domainof values.
` It could be that {V1=a} and {V2=b} are
independent:Pr(V1=aV2=b) = Pr(V1=a)*Pr(V2=b)
` It could also be that {V1=b} and {V2=b} are not
independent:Pr(V1=bV2=b) Pr(V1=b)*Pr(V2=b)
Fahiem Bacchus, University of Toronto40
-
7/31/2019 Lectures 06 Uncertainty
41/154
i
-
7/31/2019 Lectures 06 Uncertainty
42/154
Variable Independence
` If you know the value of Z (whateverit is),learning Ys value (whatever it is) does notinfluence your beliefs about any of Xs values.
` these definitions differ from earlier ones, which talkabout particular sets of events being independent.Variable independence is a concise way of stating
a number of individual independencies.
Fahiem Bacchus, University of Toronto42
Wh t d i d d b ?
-
7/31/2019 Lectures 06 Uncertainty
43/154
What does independence buys us?
` Suppose (say, boolean) variablesX1, X2,, Xn aremutually independent (i.e., every subset is variableindependent of every other subset)
` we can specify fulljoint distribution (probability functionover all vectors of values) using only n parameters (linear)instead of 2n -1 (exponential)
`
How? Simply specify Pr(X1), Pr(Xn) (i.e., Pr(Xi=true)for all i)
` from this I can recover probability of any primitive eventeasily (or any conjunctive query).
e.g. Pr(X1X2X3X4) = Pr(X1) (1-Pr(X2)) Pr(X3) Pr(X4)
` we can condition on observed valueXk (orXk) trivially
Pr(X1X2|X3) = Pr(X1) (1-Pr(X2))
Fahiem Bacchus, University of Toronto43
Th V l f I d d
-
7/31/2019 Lectures 06 Uncertainty
44/154
The Value of Independence
`Complete independence reduces bothrepresentation of joint and inference from O(2n)to O(n)!
`Unfortunately, such complete mutualindependence is very rare. Most realisticdomains do not exhibit this property.
`Fortunately, most domains do exhibit a fairamount of conditional independence. And we
can exploit conditional independence forrepresentation and inference as well.
`Bayesian networks do just this
Fahiem Bacchus, University of Toronto44
A A id N t ti
-
7/31/2019 Lectures 06 Uncertainty
45/154
An Aside on Notation
` Pr(X) for variable X (or set of variables) refers to the(marginal) distribution over X.
` It specifies Pr(X=d) for all dDom[X]
` Note
d Dom[X] Pr(X=d) = 1
(every vector of values must be in one of the sets{X=d} dDom[X])
` Also
Pr(X=d1X=d2) = 0 for all d1,d2 Dom[X] d1 d2
(no vector of values contains two different values for
X).Fahiem Bacchus, University of Toronto45
A A id N t ti
-
7/31/2019 Lectures 06 Uncertainty
46/154
An Aside on Notation` Pr(X|Y) refers to family of conditional distributions over X, one
for each yDom(Y).` For each dDom[Y], Pr(X|Y) specifies a distribution over the
values of X:
Pr(X=d1|Y=d), Pr(X=d2|Y=d), , Pr(X=dn|Y=d)for Dom[X] = {d1,d2,,dn}.
` Distinguish between Pr(X)which is a distributionand Pr(xi)
(xiDom[X])-- which is a number. Think of Pr(X) as a function thataccepts any xi Dom(X) as an argument and returns Pr(xi).
` Similarly, think of Pr(X|Y) as a function that accepts any
xiDom[X] and ykDom[Y] and returns Pr(xi | yk). Note thatPr(X|Y) is not a single distribution; rather it denotes the family ofdistributions (over X) induced by the different ykDom(Y)
Fahiem Bacchus, University of Toronto46
E ploiting Conditional Independence
-
7/31/2019 Lectures 06 Uncertainty
47/154
Exploiting Conditional Independence
` Lets see what conditional independence buys us
`Consider a story:
` If Craig woke up too early E, Craig probably needscoffee C; if C, Craig needs coffee, he's likely angryA. If A, there is an increased chance of an
aneurysm (burst blood vessel) B. If B, Craig is quitelikely to be hospitalized H.
Fahiem Bacchus, University of Toronto47
E C B HA
E Craig woke too early A Craig is angry H Craig hospitalized
C Craig needs coffee B Craig burst a blood vessel
Condl Independence in our Story
-
7/31/2019 Lectures 06 Uncertainty
48/154
Condl Independence in our Story
`
If you learned any of E, C, A, or B, your assessment of Pr(H)would change.
` E.g., if any of these are seen to be true, you would increasePr(h) and decrease Pr(~h).
` So H is not independent of E, or C, or A, or B.
` But if you knew value of B (true or false), learning value ofE, C, or A, would not influence Pr(H). Influence these
factors have on H is mediated by their influence on B.` Craig doesn't get sent to the hospital because he's angry, he
gets sent because he's had an aneurysm.
` So H is independent of E, and C, and A, given B
Fahiem Bacchus, University of Toronto48
E C B HA
Condl Independence in our Story
-
7/31/2019 Lectures 06 Uncertainty
49/154
Cond l Independence in our Story
`Similarly:` B is independent of E, and C, given A
` A is independent of E, given C
`This means that:` Pr(H | B, {A,C,E} ) = Pr(H|B)
` i.e., for any subset of {A,C,E}, this relation holds
` Pr(B | A, {C,E} ) = Pr(B | A)
` Pr(A | C, {E} ) = Pr(A | C)
` Pr(C | E) and Pr(E) dont simplify
Fahiem Bacchus, University of Toronto49
E C B HA
-
7/31/2019 Lectures 06 Uncertainty
50/154
Example Quantification
-
7/31/2019 Lectures 06 Uncertainty
51/154
Example Quantification
` Specifying the joint requires only 9 parameters (if we
note that half of these are 1 minus the others),instead of 31 for explicit representation
` linear in number of vars instead of exponential!` linear generally if dependence has a chain structure
Fahiem Bacchus, University of Toronto51
E C B HA
Pr(c|e) = 0.9Pr(~c|e) = 0.1Pr(c|~e) = 0.5Pr(~c|~e) = 0.5
Pr(e) = 0.7Pr(~e) = 0.3
Pr(a|c) = 0.3Pr(~a|c) = 0.7
Pr(a|~c) = 1.0Pr(~a|~c) = 0.0
Pr(h|b) = 0.9Pr(~h|b) = 0.1
Pr(h|~b) = 0.1Pr(~h|~b) = 0.9
Pr(b|a) = 0.2Pr(~b|a) = 0.8Pr(b|~a) = 0.1Pr(~b|~a) = 0.9
Inference is Easy
-
7/31/2019 Lectures 06 Uncertainty
52/154
Inference is Easy
`
Want to know P(a)? Use summing out rule:
Fahiem Bacchus, University of Toronto52
E C B HA
)Pr()|Pr()|Pr(
)Pr()|Pr()(
)()(
)(
iEDom
iiCDomc
i
i
CDomc
i
eecca
ccaaP
iei
i
=
=
These are all terms specified in our local distributions!
Inference is Easy
-
7/31/2019 Lectures 06 Uncertainty
53/154
Inference is Easy
`Computing P(a) in more concrete terms:
` P(c) = P(c|e)P(e) + P(c|~e)P(~e)
= 0.8 * 0.7 + 0.5 * 0.3 = 0.78
` P(~c) = P(~c|e)P(e) + P(~c|~e)P(~e) = 0.22
` P(~c) = 1 P(c), as well
` P(a) = P(a|c)P(c) + P(a|~c)P(~c)
= 0.7 * 0.78 + 0.0 * 0.22 = 0.546
` P(~a) = 1 P(a) = 0.454
Fahiem Bacchus, University of Toronto53
E C B HA
Bayesian Networks
-
7/31/2019 Lectures 06 Uncertainty
54/154
Bayesian Networks
`The structure above is a Bayesian network. A BNis a graphical representation of the directdependencies over a set of variables, together
with a set of conditional probability tablesquantifying the strength of those influences.
`Bayes nets generalize the above ideas in veryinteresting ways, leading to effective means of
representation and inference under uncertainty.
Fahiem Bacchus, University of Toronto54
Bayesian Networks
-
7/31/2019 Lectures 06 Uncertainty
55/154
Bayesian Networks
` A BN over variables {X1, X2,, Xn} consists of:` a DAG (directed acyclic graph) whose nodes are the
variables
` a set of CPTs (conditional probability tables) Pr(Xi | Par(Xi))for each Xi
` Key notions (see text for defns, all are intuitive):
` parents of a node: Par(Xi)` children of node
` descendents of a node
` ancestors of a node
` family: set of nodes consisting of Xi and its parents
` CPTs are defined over families in the BN
Fahiem Bacchus, University of Toronto55
Example (Binary valued Variables)
-
7/31/2019 Lectures 06 Uncertainty
56/154
Example (Binary valued Variables)
` A coupleCPTS areshown
` Explicit jointrequires211 -1 =2047parmeters
` BN requiresonly 27parmeters(the numberof entries foreach CPT islisted)
Fahiem Bacchus, University of Toronto56
Semantics of Bayes Nets
-
7/31/2019 Lectures 06 Uncertainty
57/154
Semantics of Bayes Nets.
`A Bayes net specifies that the joint distributionover the variable in the net can be written as thefollowing product decomposition.
`Pr(X1, X2,, Xn)= Pr(Xn | Par(Xn)) * Pr(Xn-1 | Par(Xn-1))
*L * Pr(X1| Par(X1))
`This equation hold for any set of values d1, d2,,
dn for the variables X1, X2,, Xn.
Fahiem Bacchus, University of Toronto57
Semantics of Bayes Nets.
-
7/31/2019 Lectures 06 Uncertainty
58/154
Semantics of Bayes Nets.
` E.g., say we have X1, X2, X3 each with domainDom[Xi] = {a, b, c} and we have
Pr(X1,X2,X3)
= P(X3|X2) P(X2) P(X1)Then
Pr(X1=a,X2=a,X3=a)
= P(X3=a|X2=a) P(X2=a) P(X1=a)Pr(X1=a,X2=a,X3=b)= P(X3=b|X2=a) P(X2
=a) P(X1=a)Pr(X1=a,X2=a,X3=c)
= P(X3=c|X2=a) P(X2=a) P(X1=a)Pr(X1=a,X2=b,X3=a)= P(X3=a|X2=b) P(X2
=b) P(X1=a)
Fahiem Bacchus, University of Toronto58
-
7/31/2019 Lectures 06 Uncertainty
59/154
Semantics of Bayes Nets.
-
7/31/2019 Lectures 06 Uncertainty
60/154
S y
`Note that this means we can compute theprobability of any setting of the variables usingonly the information contained in the CPTs of the
network.
Fahiem Bacchus, University of Toronto60
Constructing a Bayes Net
-
7/31/2019 Lectures 06 Uncertainty
61/154
g y
` It is always possible to construct a Bayes net torepresent any distribution over the variables X1,X2,, Xn, using any ordering of the variables.
Fahiem Bacchus, University of Toronto61
Take any ordering of the variables (say, the order given). From the
chain rule we obtain.
Pr(X1,,Xn) = Pr(Xn|X1,,Xn-1)Pr(Xn-1|X1,,Xn-2)Pr(X1)
Now for each Xi go through its conditioning set X1,,Xi-1, and
iteratively remove all variables Xj such that Xi is conditionallyindependent of Xj given the remaining variables. Do this until no
more variables can be removed.
The final product will specify a Bayes net.
Constructing a Bayes Net
-
7/31/2019 Lectures 06 Uncertainty
62/154
g y
` The end result will be a productdecomposition/Bayes netPr(Xn | Par(Xn)) Pr(Xn-1 | Par(Xn-1)) Pr(X1)
` Now we specify the numeric values associated with each termPr(Xi | Par(Xi)) in a CPT.
` Typically we represent the CPT as a table mapping each settingof {Xi,Par(Xi)} to the probability of Xi taking that particular valuegiven that the variables in Par(X
i
) have their specified values.
` If each variable has d different values.
` We will need a table of size d|{Xi,Par(Xi)}|.` That is, exponential in the size of the parent set.
` Note that the original chain rulePr(X1,,Xn) = Pr(Xn|X1,,Xn-1)Pr(Xn-1|X1,,Xn-2)Pr(X1)requires as much space to represent as specifying theprobability of each individual event.
Fahiem Bacchus, University of Toronto62
Causal Intuitions
-
7/31/2019 Lectures 06 Uncertainty
63/154
` The BN can be constructed using an arbitraryordering of the variables.
` However, some orderings will yield BNs with very
large parent sets. This requires exponential space,and (as we will see later) exponential time toperform inference.
` Empirically, and conceptually, a good way toconstruct a BN is to use an ordering based oncausality. This often yields a more natural and
compact BN.
Fahiem Bacchus, University of Toronto63
Causal Intuitions
-
7/31/2019 Lectures 06 Uncertainty
64/154
` Malaria, the flu and a cold all cause aches. So usethe ordering that causes come before effects
Malaria, Flu, Cold, Aches
Pr(M,F,C,A) = Pr(A|M,F,C) Pr(C|M,F) Pr(F|M) Pr(M)
` Each of these disease affects the probability of
aches, so the first conditional probability does notchange.
` It is reasonable to assume that these diseases are
independent of each other: having or not havingone does not change the probability of having theothers. So Pr(C|M,F) = Pr(C)Pr(F|M) = Pr(F)
Fahiem Bacchus, University of Toronto64
-
7/31/2019 Lectures 06 Uncertainty
65/154
-
7/31/2019 Lectures 06 Uncertainty
66/154
-
7/31/2019 Lectures 06 Uncertainty
67/154
-
7/31/2019 Lectures 06 Uncertainty
68/154
-
7/31/2019 Lectures 06 Uncertainty
69/154
Burglary Example
-
7/31/2019 Lectures 06 Uncertainty
70/154
Fahiem Bacchus, University of Toronto70
# of Params:1 + 1 + 4 + 2 + 2 = 10 (vs. 25-1 = 31)
A burglary can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call
Example of Constructing Bayes Network
-
7/31/2019 Lectures 06 Uncertainty
71/154
` Suppose we choose the orderingM, J, A, B, E
P(J | M) = P(J)?
Fahiem Bacchus, University of Toronto71
Example continue
-
7/31/2019 Lectures 06 Uncertainty
72/154
` Suppose we choose the orderingM, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)?
Fahiem Bacchus, University of Toronto72
Example continue
-
7/31/2019 Lectures 06 Uncertainty
73/154
` Suppose we choose the orderingM, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)?
P(B | A, J, M) = P(B)?
Fahiem Bacchus, University of Toronto73
Example continue
-
7/31/2019 Lectures 06 Uncertainty
74/154
` Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)?
P(E | B, A, J, M) = P(E | A, B)?
Fahiem Bacchus, University of Toronto74
Example continue
-
7/31/2019 Lectures 06 Uncertainty
75/154
` Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)? No
P(E | B, A, J, M) = P(E | A, B)? Yes
Fahiem Bacchus, University of Toronto75
-
7/31/2019 Lectures 06 Uncertainty
76/154
Inference in Bayes Nets
-
7/31/2019 Lectures 06 Uncertainty
77/154
` Given a Bayes netPr(X1, X2,, Xn)= Pr(Xn | Par(Xn)) * Pr(Xn-1 | Par(Xn-1))
*L
* Pr(X1| Par(X1))` And some evidence E = {a set of values for
some of the variables} we want to compute the
new probability distributionPr(Xk| E)
` That is, we want to figure our Pr(X_k = d |E) for
all d Dom[Xk]
77 Fahiem Bacchus, University of Toronto
Inference in Bayes Nets
-
7/31/2019 Lectures 06 Uncertainty
78/154
` Other types of examples are, computingprobability of different diseases given symptoms,computing probability of hail storms given
different metrological evidence, etc.` In such cases getting a good estimate of the
probability of the unknown event allows us to
respond more effectively (gamble rationally)
78 Fahiem Bacchus, University of Toronto
Inference in Bayes Nets
-
7/31/2019 Lectures 06 Uncertainty
79/154
` In the Alarm example we have
Pr(Breakin,Earthquake,Radio,Sound) =
Pr(Earthquake) * Pr(BreakIn) *Pr(Radio|Earthquake) *Pr(Sound|BreakIn,Earthquake)
` And, e.g., we want to compute things likePr(BreakIn=True| Radio=false, Sound=true)
79 Fahiem Bacchus, University of Toronto
Variable Elimination
-
7/31/2019 Lectures 06 Uncertainty
80/154
` Variable elimination uses the productdecomposition and the summing out rule tocompute posterior probabilities from the
information (CPTs) already in the network.
80 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
81/154
Example
-
7/31/2019 Lectures 06 Uncertainty
82/154
Pr(A,B,C,D,E,F,G,H,I,J,K) =Pr(A)Pr(B)Pr(C|A)Pr(D|A,B)Pr(E|C)Pr(F|D)Pr(G)
Pr(H|E,F)Pr(I|F,G)Pr(J|H,I)Pr(K|I)
Say that E = {H=true, I=false}, and we want to knowPr(D|h,i) (h: H is true, -h: H is false)
1. Write as a sum for each value of D
A,B,C,E,F,G,J,K Pr(A,B,C,d,E,F,h,-i,J,K)= Pr(d,h,-i)
A,B,C,E,F,G,J,K Pr(A,B,C,-d,E,F,h,-i,J,K)= Pr(-d,h,-i)
82 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
83/154
Example
-
7/31/2019 Lectures 06 Uncertainty
84/154
Pr(d,h,-i) = A,B,C,E,F,G,J,K Pr(A,B,C,d,E,F,h,-i,J,K)
Use Bayes Net product decomposition to rewrite
summation:
A,B,C,E,F,G,J,K Pr(A,B,C,d,E,F,h,-i,J,K)
= A,B,C,E,F,G,J,K Pr(A)Pr(B)Pr(C|A)Pr(d|A,B)Pr(E|C)Pr(F|d)Pr(G)Pr(h|E,F)Pr(-i|F,G)Pr(J|h,-i)Pr(K|-i)
Now rearrange summations so that we are notsumming over that do not depend on the summedvariable.
84 Fahiem Bacchus, University of Toronto
Example
-
7/31/2019 Lectures 06 Uncertainty
85/154
= A,B,C,E,F,G,J,KPr(A)Pr(B)Pr(C|A)Pr(d|A,B)Pr(E|C)Pr(F|d)Pr(G)Pr(h|E,F)Pr(-i|F,G)Pr(J|h,-i)
Pr(K|-i)= A Pr(A) B Pr(B) C Pr(C|A)Pr(d|A,B) E Pr(E|C)
F Pr(F|d) G Pr(G)Pr(h|E,F)Pr(-i|F,G) J Pr(J|h,-i)
K Pr(K|-i)
= A Pr(A) B Pr(B) Pr(d|A,B) C Pr(C|A) E Pr(E|C)
F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G) J Pr(J|h,-i)K Pr(K|-i)
85 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
86/154
Example
` N t t ti
-
7/31/2019 Lectures 06 Uncertainty
87/154
` Now start computing.A Pr(A) B Pr(B) Pr(d|A,B) C Pr(C|A) E Pr(E|C)
F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)
J Pr(J|h,-i)K Pr(K|-i)
K Pr(K|-i) = Pr(k|-i) + Pr(-k|-i) = c1
JPr(J|h,-i) c1 = c1 JPr(J|h,-i)
= c1 (Pr(j|h,-i) + Pr(-j|h,-i))= c1c2
87 Fahiem Bacchus, University of Toronto
Example
`
-
7/31/2019 Lectures 06 Uncertainty
88/154
`
A Pr(A) B Pr(B) Pr(d|A,B) C Pr(C|A) E Pr(E|C)F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)
J Pr(J|h,-i)K Pr(K|-i)
c1c2 G Pr(G) Pr(-i|F,G)= c1c2(Pr(g)Pr(-i|F,g) + Pr(-g)Pr(-i|F,-g))
!!But Pr(-i|F,g) depends on the value of F, sothis is not a single number.
88 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
89/154
Example
-
7/31/2019 Lectures 06 Uncertainty
90/154
=Pr(a)Pr(b) Pr(d|a,b) C Pr(C|a) E Pr(E|C)F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)
+Pr(a)Pr(-b) Pr(d|a,-b) C Pr(C|a) E Pr(E|C)
F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)
+Pr(-a)Pr(b) Pr(d|-a,b) C Pr(C|-a) E Pr(E|C)
F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)
+Pr(-a)Pr(-b) Pr(d|-a,-b) C Pr(C|-a) E Pr(E|C)
F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)
90 Fahiem Bacchus, University of Toronto
Example
-
7/31/2019 Lectures 06 Uncertainty
91/154
=Yikes! The size of the sum is doubling as weexpand each variable (into v and v). This
approach has exponential complexity.
But lets look a bit closer.
91 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
92/154
-
7/31/2019 Lectures 06 Uncertainty
93/154
-
7/31/2019 Lectures 06 Uncertainty
94/154
-
7/31/2019 Lectures 06 Uncertainty
95/154
-
7/31/2019 Lectures 06 Uncertainty
96/154
-
7/31/2019 Lectures 06 Uncertainty
97/154
Variable Elimination (VE)`
VE works from the inside out, summing out K, then J, then G,as we tried to before
-
7/31/2019 Lectures 06 Uncertainty
98/154
VE works from the inside out, summing out K, then J, then G,, as we tried to before.
` When we tried to sum out G
A Pr(A) B Pr(B) Pr(d|A,B) C Pr(C|A) E Pr(E|C)F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)
c1c2 G Pr(G) Pr(-i|F,G)= c1c2(Pr(g)Pr(-i|F,g) + Pr(-g)Pr(-i|F,-g))
we found that Pr(-i|F,-g) depends on the value of F, it wasnt a
single number.` However, we can still continue with the computation by
computing two different numbers, one for each value of F(-f, f)!
98 Fahiem Bacchus, University of Toronto
Variable Elimination (VE)
` t(-f) = c1c2 G Pr(G) Pr(-i|-f G)
-
7/31/2019 Lectures 06 Uncertainty
99/154
` t( f) c c Pr(G) Pr( i| f,G)
t(f) = c1c2(G Pr(G) Pr(-i|f,G)
` t(-f) = c1c2(Pr(g)Pr(-i|-f,g) + Pr(-g)Pr(-i|-f,-g))
` t(-f) = c1c2(Pr(g)Pr(-i|f,g) + Pr(-g)Pr(-i|f,-g))
` Now we sum out F
99 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
100/154
Variable Elimination (VE)
` c1c2(Pr(f|d) Pr(h|E,f)t(f)
-
7/31/2019 Lectures 06 Uncertainty
101/154
c c (Pr(f|d) Pr(h|E,f)t(f)+Pr(-f|d)Pr(h|E,-f)t(-f)
` This is a function of E, so we obtain two newnumbers
s(e) = c1c2(Pr(f|d) Pr(h|e,f)t(f)+Pr(-f|d)Pr(h|e,-f)t(-f)
s(-e) = c1c2(Pr(f|d) Pr(h|-e,f)t(f)+Pr(-f|d)Pr(h|-e,-f)t(-f)
101 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
102/154
-
7/31/2019 Lectures 06 Uncertainty
103/154
-
7/31/2019 Lectures 06 Uncertainty
104/154
-
7/31/2019 Lectures 06 Uncertainty
105/154
The Product of Two Factors
`Let f(X,Y) & g(Y,Z) be two factors with variables Yin common
-
7/31/2019 Lectures 06 Uncertainty
106/154
( ) g( )in common
`The product of f and g, denoted h = f x g (or
sometimes just h = fg), is defined:h(X,Y,Z) = f(X,Y) x g(Y,Z)
106
f(A,B) g(B,C) h(A,B,C)
ab 0.9 bc 0.7 abc 0.63 ab~c 0.27
a~b 0.1 b~c 0.3 a~bc 0.08 a~b~c 0.02
~ab 0.4 ~bc 0.8 ~abc 0.28 ~ab~c 0.12
~a~b 0.6 ~b~c 0.2 ~a~bc 0.48~a~b~
c0.12
Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
107/154
Restricting a Factor
`Let f(X,Y) be a factor with variable X (Y is a set)W t i t f t f t X b tti X t th
-
7/31/2019 Lectures 06 Uncertainty
108/154
( ) ( )`Werestrict factor f to X=a by setting X to thevalue x and deleting incompatible elements
of fs domain . Define h = fX=a as: h(Y) = f(a,Y)
108
f(A,B)h(B) =
fA=aab 0.9 b 0.9
a~b 0.1 ~b 0.1
~ab 0.4
~a~b 0.6
Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
109/154
VE: Example
Factors: f1(A) f2(B) f3(A,B,C)f4(C,D)C D
Af1(A)
-
7/31/2019 Lectures 06 Uncertainty
110/154
Restriction: replace f4(C,D) with f5(C) = f4(C,d)
Step 1: Compute & Add f6(A,B)= C f5(C) f3(A,B,C)
Remove: f3(A,B,C), f5(C)
Step 2: Compute & Add f7(A) = B f6(A,B) f2(B)
Remove: f6(A,B), f2(B)Last factors: f7(A), f1(A). The product f1(A) x f7(A) is (unnormalized)
posterior. So P(A|d) = f1(A) x f7(A)where = 1/A f1(A)f7(A)
110
( )
Query: P(A)?
Evidence: D = d
Elim. Order: C, B
C D
f3(A,B,C)f4(C,D)Bf2(B)
Fahiem Bacchus, University of Toronto
Numeric Example
`Heres the example with some numbers
-
7/31/2019 Lectures 06 Uncertainty
111/154
111
B CA
f1(A)f2(A,B) f3(B,C)
f1(A) f 2(A,B) f 3(B,C)f4(B)
A f2(A,B)f1(A)
f5(C)
B f3(B,C) f4(B)a 0.9 ab 0.9 bc 0.7 b 0.85 c 0.625
~a 0.1 a~b 0.1 b~c 0.3 ~b 0.15 ~c 0.375
~ab 0.4 ~bc 0.2
~a~b 0.6 ~b~c 0.8
Fahiem Bacchus, University of Toronto
VE: Buckets as a Notational Device
A Ef5(C,E)
-
7/31/2019 Lectures 06 Uncertainty
112/154
112
C
D
Af1(A)
f3(A,B,C)
f4(C,D)
Bf2(B)
EF
f6E,D,F)
Ordering:C,F,A,B,E,D
1. C:
2. F:
3. A:
4. B:5. E:
6. D: Fahiem Bacchus, University of Toronto
VE: BucketsPlace Original Factors infirst applicable bucket.
A Ef5(C,E)
d
-
7/31/2019 Lectures 06 Uncertainty
113/154
113
C
D
Af1(A)
f3(A,B,C)
f4(C,D)
Bf2(B)
EF
f6(E,D,F)
1. C: f3(A,B,C), f4(C,D), f5(C,E)
2. F: f6(E,D,F)
3. A: f1(A)
4. B: f2(B)5. E:
6. D:
Ordering:C,F,A,B,E,D
Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
114/154
VE: Eliminate the variables in order, placingnew factor in first applicable bucket.
A Ef5(C,E)
O d i
-
7/31/2019 Lectures 06 Uncertainty
115/154
115
C
D
Af1(A)
f3(A,B,C)
f4(C,D)
Bf2(B)
EF
f6(E,D,F)
1. C: f3(A,B,C), f4(C,D), f5(C,E)
2. F: f6(E,D,F)
3. A: f1(A), f7(A,B,D,E)
4. B: f2(B)5. E: f8(E,D)
6. D:
2. F f6(E,D,F) = f8(E,D)
Ordering:C,F,A,B,E,D
Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
116/154
VE: Eliminate the variables in order, placingnew factor in first applicable bucket.
Af (A) Ef5(C,E)
O d in :
-
7/31/2019 Lectures 06 Uncertainty
117/154
117
C
D
Af1(A)
f3(A,B,C)
f4(C,D)
Bf2(B)
EF
f6(E,D,F)
1. C: f3(A,B,C), f4(C,D), f5(C,E)
2. F: f6(E,D,F)
3. A: f1(A), f7(A,B,D,E)
4. B: f2(B), f9(B,D,E)5. E: f8(E,D), f10(D,E)
6. D:
4. B f2(B), f9(B,D,E)= f10(D,E)
Ordering:C,F,A,B,E,D
Fahiem Bacchus, University of Toronto
VE: Eliminate the variables in order, placingnew factor in first applicable bucket.
Af (A) Ef5(C,E)
Ordering:
-
7/31/2019 Lectures 06 Uncertainty
118/154
118
C
D
Af1(A)
f3(A,B,C)
f4(C,D)
Bf2(B)
EF
f6(E,D,F)
1. C: f3(A,B,C), f4(C,D), f5(C,E)
2. F: f6(E,D,F)
3. A: f1(A), f7(A,B,D,E)
4. B: f2(B), f9(B,D,E)5. E: f8(E,D), f10(D,E)
6. D: f11
(D)
5. E f8(E,D), f10(D,E)= f11(D)
f11 is he final answer, once wenormalize it.
Ordering:C,F,A,B,E,D
Fahiem Bacchus, University of Toronto
Complexity of Variable Elimination
` Hypergraph of Bayes Net.` Hypergraph has vertices just like an ordinary graph,
-
7/31/2019 Lectures 06 Uncertainty
119/154
yp g p j y g pbut instead of edges between two vertices XY it
contains hyperedges.` A hyperedge is a set of vertices (i.e., potentially more than
one)
119
A
B
C
D
E{A,B,D}{B,C,D}
{E,D}
Fahiem Bacchus, University of Toronto
Complexity of Variable Elimination
` Hypergraph of Bayes Net.` The set of vertices are precisely the nodes of the
-
7/31/2019 Lectures 06 Uncertainty
120/154
p yBayes net.
` The hyperedges are the variables appearing in eachCPT.
` {Xi} Par(Xi)
120 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
121/154
Variable Elimination in the HyperGraph
` To eliminate variable Xi in the hypergraph we` we remove the vertex Xi
-
7/31/2019 Lectures 06 Uncertainty
122/154
` Create a new hyperedge Hi equal to the union of all
of the hyperedges that contain Xi minus Xi` Remove all of the hyperedges containing X from the
hypergraph.
`
Add the new hyperedge Hi to the hypergraph.
122 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
123/154
Complexity of Variable Elimination
CA E
F
-
7/31/2019 Lectures 06 Uncertainty
124/154
` Eliminate D
124
C
DB
F
C
A
B
E
F
Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
125/154
Variable Elimination
`
Notice that when we start VE we have a set offactors consisting of the reduced CPTs. Theunassigned variables for the vertices and the set of
-
7/31/2019 Lectures 06 Uncertainty
126/154
unassigned variables for the vertices and the set ofvariables each factor depends on forms thehyperedges of a hypergraph H1.
` If the first variable we eliminate is X, then we removeall factors containing X (all hyperedges) and add anew factor that has as variables the union of thevariables in the factors containing X (we add a
hyperdege that is the union of the removedhyperedges minus X).
126 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
127/154
-
7/31/2019 Lectures 06 Uncertainty
128/154
-
7/31/2019 Lectures 06 Uncertainty
129/154
-
7/31/2019 Lectures 06 Uncertainty
130/154
-
7/31/2019 Lectures 06 Uncertainty
131/154
VE: Eliminate B, placing new factor f10 infirst applicable bucket.
Af1(A) E
f5(C,E)
FOrdering:
-
7/31/2019 Lectures 06 Uncertainty
132/154
132
C
Df3(A,B,C)f4(C,D)
Bf2(B)
F
f6(E,D,F)
1. C: f3(A,B,C), f4(C,D), f5(C,E)
2. F: f6(E,D,F)
3. A: f1(A), f7(A,B,D,E)
4. B: f2(B), f9(B,D,E)5. E: f8(E,D), f10(D,E)
6. D:
C,F,A,B,E,D
D
E
Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
133/154
-
7/31/2019 Lectures 06 Uncertainty
134/154
Elimination Width
` If the elimination width of an ordering is k, then thecomplexity of VE using that ordering is 2O(k)
` Elimination width k means that at some stage in the
-
7/31/2019 Lectures 06 Uncertainty
135/154
gelimination process a factor involving k variables
was generated.` That factor will require 2O(k) space to store
` space complexity of VE is 2O(k)
` And it will require 2O(k) operations to process (eitherto compute in the first place, or when it is beingprocessed to eliminate one of its variables).` Time complexity of VE is 2O(k)
` NOTE, that k is the elimination width of this particularordering.
135 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
136/154
Tree Width
` Exponential in the tree width is the best that VE cando.` Finding an ordering that has elimination width equal to tree
idth is NP Hard
-
7/31/2019 Lectures 06 Uncertainty
137/154
width is NP-Hard.
`
so in practice there is no point in trying to speed up VE byfinding the best possible elimination ordering.
` Heuristics are used to find orderings with good (low)elimination widths.
` In practice, this can be very successful. Elimination widthscan often be relatively small, 8-10 even when the networkhas 1000s of variables.
` Thus VE can be much!! more efficient than simply summing
the probability of all possible events (which is exponentialin the number of variables).
` Sometimes, however, the treewidth is equal to the numberof variables.
137 Fahiem Bacchus, University of Toronto
Finding Good Orderings
`
A polytrees is a singly connected Bayes Net: inparticularthere is only one path between anytwo nodes.
-
7/31/2019 Lectures 06 Uncertainty
138/154
two nodes.
` A node can have multiple parents, but we haveno cycles.
` Good orderings are easy to find for polytrees
` At each stage eliminate a singly c onnec ted node.` Because we have a polytree we are assured that a
singly connected node will exist at each elimination
stage.` The size of the factors in the tree never increase.
138 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
139/154
Effect of Different Orderings
`
Suppose query variableis D. Consider differentorderings for this network
-
7/31/2019 Lectures 06 Uncertainty
140/154
g
(not a polytree!)` A,F,H,G,B,C,E:
` good
`
E,C,A,B,G,H,F:` bad
140 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
141/154
Relevance
`Certain variables have no impact on the query.
B CA
-
7/31/2019 Lectures 06 Uncertainty
142/154
p q y
In network ABC, computing Pr(A) with noevidence requires elimination of B and C.
` But when you sum out these vars, you compute a
trivial factor (whose value are all ones); forexample:
` eliminating C: f4(B) = C f3(B,C) = C Pr(C|B)
` 1 for any value of B (e.g., Pr(c|b) + Pr(~c|b) = 1)`No need to think about B or C for this query
142 Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
143/154
Relevance: Examples
` Query: P(F)` relevant: F, C, B, A
` Query: P(F|E)l t F C B A
C
A B
G
-
7/31/2019 Lectures 06 Uncertainty
144/154
` relevant: F, C, B, A
` also: E, henc e D, G` intuitively, we need to compute
P(C|E) to compute P(F|E)
` Query: P(F|H)` relevant F,C,A,B.
Pr(A)Pr(B)Pr(C|A,B)Pr(F|C) Pr(G)Pr(h|G)Pr(D|G,C)Pr(E|D)= Pr(G)Pr(h|G)Pr(D|G,C) EPr(E|D) = a table of 1s
= Pr(G)Pr(h|G) D Pr(D|G,C) = a table of 1s= [Pr(A)Pr(B)Pr(C|A,B)Pr(F|C)] [Pr(G)Pr(h|G)][Pr(G)Pr(h|G)] 1 but irrelevant
once we normalize, multiplies each value ofF equally
144
D
E
FH
Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
145/154
-
7/31/2019 Lectures 06 Uncertainty
146/154
-
7/31/2019 Lectures 06 Uncertainty
147/154
-
7/31/2019 Lectures 06 Uncertainty
148/154
-
7/31/2019 Lectures 06 Uncertainty
149/154
-
7/31/2019 Lectures 06 Uncertainty
150/154
-
7/31/2019 Lectures 06 Uncertainty
151/154
D-Separation: Intuitions
Flu and Mal are indep.(given no evidence):
-
7/31/2019 Lectures 06 Uncertainty
152/154
152
Fever blocks thepath, since it is not inevidence, nor is itsdecsendant Therm.
Flu and Mal aredependent givenFever (or given
Therm): nothingblocks path now.Whats the intuition?
Fahiem Bacchus, University of Toronto
-
7/31/2019 Lectures 06 Uncertainty
153/154
-
7/31/2019 Lectures 06 Uncertainty
154/154