lectures 06 uncertainty

Upload: muhammad-ibrahim-marwat

Post on 05-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Lectures 06 Uncertainty

    1/154

    Fahiem Bacchus, University of Toronto1

    `Reasoning Under Uncertainty

    ` This material is covered in chapters 13 and 14.Chapter 13 gives some basic background on

    probability from the point of view of AI. Chapter14 talks about Bayesian Networks, which will be amain topic for us.

    CSC384h: Intro to Artificial Intelligence

  • 7/31/2019 Lectures 06 Uncertainty

    2/154

    Uncertainty

    ` With First Order Logic we examined a mechanism forrepresenting true facts and for reasoning to newtrue facts.

    ` The emphasis on truth is sensible in some domains.

    ` But in many domain it is not sufficient to deal onlywith true facts. We have to gamble.

    ` E.g., we dont know for certain what the traffic willbe like on a trip to the airport.

    2 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    3/154

    Uncertainty

    ` But how do we gamble rationally?

    ` If we must arrive at the airport at 9pm on a week

    night we could safely leave for the airport hourbefore.

    ` Some probability of the trip taking longer, but theprobability is low.

    ` If we must arrive at the airport at 4:30pm on Fridaywe most likely need 1 hour or more to get to theairport.

    ` Relatively high probability of it taking 1.5 hours.

    Fahiem Bacchus, University of Toronto3

  • 7/31/2019 Lectures 06 Uncertainty

    4/154

    Uncertainty

    ` To act rationally under uncertainty we must beable to evaluate how likely certain things are.

    `

    With FOL a fact F is only useful if it is known to be trueor false.

    ` But we need to be able to evaluate how likely it isthat F is true.

    ` By weighing likelihoods of events (probabilities)we can develop mechanisms for acting

    rationally under uncertainty.

    Fahiem Bacchus, University of Toronto4

  • 7/31/2019 Lectures 06 Uncertainty

    5/154

    Dental Diagnosis example.

    ` In FOL we might formulate

    ` P. symptom(P,toothache)

    disease(p,cavity)

    disease(p,gumDisease)

    disease(p,foodStuck) L

    ` When do we stop?

    ` Cannot list all possible causes.

    ` We also want to rank the possibilities. We dont want tostart drilling for a cavity before checking for more likelycauses first.

    Fahiem Bacchus, University of Toronto5

  • 7/31/2019 Lectures 06 Uncertainty

    6/154

    Probability (over Finite Sets)

    ` Probability is a function defined over a set ofevents U. Often called the universe of events.

    ` It assigns a value Pr(e) to each event e U, inthe range [0,1].

    ` It assigns a value to every set of events F bysumming the probabilities of the members ofthat set.

    Pr(F) = e F Pr(e)

    ` Pr(U) = 1, i.e., sum over all events is 1.

    ` Therefore: Pr({}) = 0 andPr(A B) = Pr(A) + Pr(B) Pr(A B)

    Fahiem Bacchus, University of Toronto6

  • 7/31/2019 Lectures 06 Uncertainty

    7/154

    Probability in General

    ` Given a set U (universe), a probability functionis a function defined over the subsets of U that

    maps each subset to the real numbers andthat satisfies the Axioms of Probability

    1. Pr(U) = 1

    2. Pr(A) [0,1]3. Pr(A B) = Pr(A) + Pr(B) Pr(A B)

    Note if A B = {} then Pr(A B) = Pr(A) + Pr(B)

    Fahiem Bacchus, University of Toronto7

  • 7/31/2019 Lectures 06 Uncertainty

    8/154

  • 7/31/2019 Lectures 06 Uncertainty

    9/154

  • 7/31/2019 Lectures 06 Uncertainty

    10/154

  • 7/31/2019 Lectures 06 Uncertainty

    11/154

    Probability over Feature Vectors

    ` If we had Pr of every atomic event (fullinstantiation of the variables) we couldcompute the probability of any other set

    (including sets can cannot be specified someset of variable values).

    ` E.g.` {V1 = a} = set of all events where V1 = a

    `Pf({V1 = a}) =

    x

    2 Dom[V

    2] x

    3 Dom[V

    3]L x

    n Dom[V

    n]

    Pr(V1=a, V2=x2,V3=x3,,Vn=xn)

    Fahiem Bacchus, University of Toronto11

  • 7/31/2019 Lectures 06 Uncertainty

    12/154

  • 7/31/2019 Lectures 06 Uncertainty

    13/154

    Conditional Probabilities.

    ` In logic one has implication to expressconditional statements.

    `

    X. apple(X) goodToEat(X)` This assertion only provides useful information about

    apples.

    `

    With probabilities one has access to a differentway of capturing conditional information:conditional probabilities.

    ` It turns out that conditional probabilities areessential for both representing and reasoningwith probabilistic information.

    Fahiem Bacchus, University of Toronto13

  • 7/31/2019 Lectures 06 Uncertainty

    14/154

    Conditional Probabilities

    ` Say that A is a set of events such thatPr(A) > 0.

    ` Then one can define a conditional probabilitywrt the event A:

    Pr(B|A) = Pr(BA)/Pr(A)

    Fahiem Bacchus, University of Toronto14

  • 7/31/2019 Lectures 06 Uncertainty

    15/154

    Conditional Probabilities

    Fahiem Bacchus, University of Toronto15

    B

    A

    B coversabout 30%

    of theentirespace, but

    covers over80% of A.

  • 7/31/2019 Lectures 06 Uncertainty

    16/154

  • 7/31/2019 Lectures 06 Uncertainty

    17/154

    Conditional Probabilities

    Fahiem Bacchus, University of Toronto17

    B

    A

    Bsprobability in

    the newuniverse A is0.8.

  • 7/31/2019 Lectures 06 Uncertainty

    18/154

    18niversit

    ` A conditional probability is a probabilityfunction, but now over A instead of over the

    entire space.

    ` Pr(A| A) = 1

    ` Pr(B| A) [0,1]` Pr(C B| A) = Pr(C| A) + Pr(B| A) Pr(C B| A)

    Conditional Probabilities

    Fahiem Bacchus, U y of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    19/154

    19niversit

    Properties and Sets

    ` In First Order Logic, properties like tall(X), areinterpreted as a set of individuals (the individualsthat are tall).

    ` Similarly any set of events A can be interpreted as aproperty: the set of events with property A.

    ` When we specify big(X)tall(X) in FOL, we interpretthis as the set of individuals that lie in the intersectionof the sets big(X) and tall(X).

    Fahiem Bacchus, U y of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    20/154

  • 7/31/2019 Lectures 06 Uncertainty

    21/154

    Independence

    ` It could be that the density of B on A isidentical to its density on the entire set.

    `Density: pick an element at random from the entire set.How likely is it that the picked element is in the set B?

    ` Alternately the density of B on A could bemuch different that its density on the wholespace.

    ` In the first case we say that B is independent ofA. While in the second case B is dependent onA.

    Fahiem Bacchus, University of Toronto21

  • 7/31/2019 Lectures 06 Uncertainty

    22/154

    Independence

    Fahiem Bacchus, University of Toronto22

    Density

    of B

    A A

    Independent Dependent

  • 7/31/2019 Lectures 06 Uncertainty

    23/154

    Independence Definition

    A and B are independent properties

    Pr(B|A) = Pr(B)

    A and B are dependent.

    Pr(B|A) Pr(B)

    Fahiem Bacchus, University of Toronto23

  • 7/31/2019 Lectures 06 Uncertainty

    24/154

    Implications for Knowledge

    ` Say that we have picked an element from theentire set. Then we find out that this elementhas property A (i.e., is a member of the set A).

    ` Does this tell us anything more about how likely it isthat the element also has property B?

    ` If B is independent of A then we have learnednothing new about the likelihood of the elementbeing a member of B.

    Fahiem Bacchus, University of Toronto24

  • 7/31/2019 Lectures 06 Uncertainty

    25/154

  • 7/31/2019 Lectures 06 Uncertainty

    26/154

    Conditional Independence

    ` Say we have already learned that the randomlypicked element has property A.

    ` We want to know whether or not the element has

    property B:Pr(B|A) expresses the

    probability of this beingtrue.

    ` Now we learn that the element also has propertyC. Does this give us more information about B-ness?

    Pr(B|AC) expresses the probabilityof this being true under theadditional information.

    Fahiem Bacchus, University of Toronto26

  • 7/31/2019 Lectures 06 Uncertainty

    27/154

    Conditional Independence` If

    Pr(B|AC) = Pr(B|A)

    then we have not gained any additional information from

    knowing that the element is also a member of the set C.

    ` In this case we say that B is conditionally independent of Cgiven A.

    ` That is, once we know A, additionally knowing C is irrelevant(wrt to whether or not B is true).

    `

    Conditional independence is independence in the conditionalprobability space Pr(|A).

    ` Note we could have Pr(B|C) Pr(B). But once we learn A, Cbecomes irrelevant.

    Fahiem Bacchus, University of Toronto27

    Computational Impact of

  • 7/31/2019 Lectures 06 Uncertainty

    28/154

    Computational Impact ofIndependence

    ` We will see in more detail how independenceallows us to speed up computation. But thefundamental insight is that

    If A and B are independent properties then

    Pr(AB) = Pr(B) * Pr(A)

    Proof:Pr(B|A) = Pr(B) independencePr(AB)/Pr(A) = Pr(B) definitionPr(AB) = Pr(B) * Pr(A)

    Fahiem Bacchus, University of Toronto28

  • 7/31/2019 Lectures 06 Uncertainty

    29/154

    Computational Impact of

  • 7/31/2019 Lectures 06 Uncertainty

    30/154

    Computational Impact ofIndependence

    ` Similarly for conditional independence.Pr(B|CA) = Pr(B|A)

    Pr(BC|A) = Pr(B|A) * Pr(C|A)

    Proof:Pr(B|CA) = Pr(B|A)

    independence

    Pr(BCA)/Pr(CA) = Pr(BA)/Pr(A) defn.Pr(BCA)/Pr(A) = Pr(CA)/Pr(A) * Pr(BA)/Pr(A)Pr(BC|A) = Pr(B|A) * Pr(C|A) defn.

    Fahiem Bacchus, University of Toronto30

    Computational Impact of

  • 7/31/2019 Lectures 06 Uncertainty

    31/154

    Computational Impact ofIndependence

    ` Conditional independence allows us to breakup our computation onto distinct parts

    Pr(BC|A) = Pr(B|A) * Pr(C|A)

    ` And it also allows us to ignore certain pieces of

    information

    Pr(B|AC) = Pr(B|A)

    Fahiem Bacchus, University of Toronto31

  • 7/31/2019 Lectures 06 Uncertainty

    32/154

    Bayes Rule

    ` Bayes rule is a simple mathematical fact. But ithas great implications wrt how probabilities canbe reasoned with.

    ` Pr(Y|X) = Pr(X|Y)Pr(Y)/Pr(X)

    Pr(Y|X) = Pr(YX)/Pr(X)= Pr(YX)/Pr(X) * P(Y)/P(Y)= Pr(YX)/Pr(Y) * Pr(Y)/Pr(X)

    = Pr(X|Y)Pr(Y)/Pr(X)

    Fahiem Bacchus, University of Toronto32

  • 7/31/2019 Lectures 06 Uncertainty

    33/154

    Bayes Rule` Bayes rule allows us to use a supplied conditional probability in

    both directions.

    `

    E.g., from treating patients with heart disease we might beable to estimate the value of

    Pr( high_Cholesterol | heart_disease)

    ` With Bayes rule we can turn this around into a predictor forheart disease

    Pr(heart_disease | high_Cholesterol)

    ` Now with a simple blood test we can determinehigh_Cholesterol use this to help estimate the likelihood ofheart disease.

    Fahiem Bacchus, University of Toronto33

  • 7/31/2019 Lectures 06 Uncertainty

    34/154

    Bayes Rule

    ` For this to work we have to deal with the otherfactors as well

    Pr(heart_disease | high_Cholesterol)= Pr(high_Cholesterol | heart_disease)

    * Pr(heart_disease)/Pr(high_Cholesterol)

    ` We will return to this later.

    Fahiem Bacchus, University of Toronto34

  • 7/31/2019 Lectures 06 Uncertainty

    35/154

    Bayes Rule Example

    ` Disease {malaria, cold, flu}; Symptom = fever

    ` Must compute Pr(D | fever) to prescribe treatment

    `

    Why not assess this quantity directly?` Pr(mal | fever) is not natural to assess;

    Pr(fever | mal) reflects the underlying causalmechanism

    ` Pr(mal | fever) is not stable: a malaria epidemicchanges this quantity (for example)

    ` So we use Bayes rule:` Pr(mal | fever) = Pr(fever | mal) Pr(mal) / Pr(fever)

    Fahiem Bacchus, University of Toronto35

  • 7/31/2019 Lectures 06 Uncertainty

    36/154

    Bayes Rule

    ` Pr(mal | fever) = Pr(fever | mal)Pr(mal)/Pr(fever)

    `

    What about Pr(mal)?` This is the prior probability of Malaria, i.e., before you

    exhibited a fever, and with it we can account forother factors, e.g., a malaria epidemic, or recenttravel to a malaria risk zone.

    Fahiem Bacchus, University of Toronto36

  • 7/31/2019 Lectures 06 Uncertainty

    37/154

    Bayes Rule

    ` Pr(mal | fever) = Pr(fever | mal)Pr(mal)/Pr(fever)

    ` What about Pr(fever)?`

    We computePr(fever|mal)Pr(mal),Pr(fever|cold)Pr(cold).Pr(fever|flu)Pr(flu).

    ` I.e., solve the same problem for each possible clause of

    fever.

    ` Pr(fever|mal)Pr(mal)= Pr(fevermal)/Pr(mal) * Pr(mal)

    = Pr(fevermal)

    ` Similarly we can obtainPr(fevercold)Pr(feverflu)

    Fahiem Bacchus, University of Toronto37

  • 7/31/2019 Lectures 06 Uncertainty

    38/154

    Bayes Rule

    ` Say that in our example, m, c and fl are only possible causesof fever (Pr(fev| m c fl) = 0) and they are mutually

    exclusive. Then

    Pr(fev) = Pr(m&fev) + Pr(c&fev) + Pr(fl&fev)

    ` So we can sum the numbers we obtained for each diseaseto obtain the last factor in Bayes Rule: Pr(fever).

    ` Then we divide each answer by Pr(fever) to get the finalprobabilities Pr(mal|fever), Pr(cold|fever), Pr(flu|fever).

    ` This summing and dividing by the sum ensures thatPr(mal|fever) + Pr(cold|fever) + Pr(flu|fever) = 1, and iscalled normalizing.

    Fahiem Bacchus, University of Toronto38

  • 7/31/2019 Lectures 06 Uncertainty

    39/154

    Chain Rule

    ` Pr(A1A2An) =Pr(A1| A2An) * Pr(A2| A3 An)*L * Pr(An-1| An) * Pr(An)

    Proof:

    Pr(A1| A

    2A

    n) * Pr(A

    2| A

    3A

    n)

    *L * Pr(An-1| An)= Pr(A1A2An)/ Pr(A2An) *

    Pr(A2A

    n)/ Pr(A

    3A

    n) *L *

    Pr(An-1An)/Pr(An) * Pr(An)

    Fahiem Bacchus, University of Toronto39

  • 7/31/2019 Lectures 06 Uncertainty

    40/154

    Variable Independence

    ` Recall that we will be mainly dealing withprobabilities over feature vectors.

    `

    We have a set of variables, each with a domainof values.

    ` It could be that {V1=a} and {V2=b} are

    independent:Pr(V1=aV2=b) = Pr(V1=a)*Pr(V2=b)

    ` It could also be that {V1=b} and {V2=b} are not

    independent:Pr(V1=bV2=b) Pr(V1=b)*Pr(V2=b)

    Fahiem Bacchus, University of Toronto40

  • 7/31/2019 Lectures 06 Uncertainty

    41/154

    i

  • 7/31/2019 Lectures 06 Uncertainty

    42/154

    Variable Independence

    ` If you know the value of Z (whateverit is),learning Ys value (whatever it is) does notinfluence your beliefs about any of Xs values.

    ` these definitions differ from earlier ones, which talkabout particular sets of events being independent.Variable independence is a concise way of stating

    a number of individual independencies.

    Fahiem Bacchus, University of Toronto42

    Wh t d i d d b ?

  • 7/31/2019 Lectures 06 Uncertainty

    43/154

    What does independence buys us?

    ` Suppose (say, boolean) variablesX1, X2,, Xn aremutually independent (i.e., every subset is variableindependent of every other subset)

    ` we can specify fulljoint distribution (probability functionover all vectors of values) using only n parameters (linear)instead of 2n -1 (exponential)

    `

    How? Simply specify Pr(X1), Pr(Xn) (i.e., Pr(Xi=true)for all i)

    ` from this I can recover probability of any primitive eventeasily (or any conjunctive query).

    e.g. Pr(X1X2X3X4) = Pr(X1) (1-Pr(X2)) Pr(X3) Pr(X4)

    ` we can condition on observed valueXk (orXk) trivially

    Pr(X1X2|X3) = Pr(X1) (1-Pr(X2))

    Fahiem Bacchus, University of Toronto43

    Th V l f I d d

  • 7/31/2019 Lectures 06 Uncertainty

    44/154

    The Value of Independence

    `Complete independence reduces bothrepresentation of joint and inference from O(2n)to O(n)!

    `Unfortunately, such complete mutualindependence is very rare. Most realisticdomains do not exhibit this property.

    `Fortunately, most domains do exhibit a fairamount of conditional independence. And we

    can exploit conditional independence forrepresentation and inference as well.

    `Bayesian networks do just this

    Fahiem Bacchus, University of Toronto44

    A A id N t ti

  • 7/31/2019 Lectures 06 Uncertainty

    45/154

    An Aside on Notation

    ` Pr(X) for variable X (or set of variables) refers to the(marginal) distribution over X.

    ` It specifies Pr(X=d) for all dDom[X]

    ` Note

    d Dom[X] Pr(X=d) = 1

    (every vector of values must be in one of the sets{X=d} dDom[X])

    ` Also

    Pr(X=d1X=d2) = 0 for all d1,d2 Dom[X] d1 d2

    (no vector of values contains two different values for

    X).Fahiem Bacchus, University of Toronto45

    A A id N t ti

  • 7/31/2019 Lectures 06 Uncertainty

    46/154

    An Aside on Notation` Pr(X|Y) refers to family of conditional distributions over X, one

    for each yDom(Y).` For each dDom[Y], Pr(X|Y) specifies a distribution over the

    values of X:

    Pr(X=d1|Y=d), Pr(X=d2|Y=d), , Pr(X=dn|Y=d)for Dom[X] = {d1,d2,,dn}.

    ` Distinguish between Pr(X)which is a distributionand Pr(xi)

    (xiDom[X])-- which is a number. Think of Pr(X) as a function thataccepts any xi Dom(X) as an argument and returns Pr(xi).

    ` Similarly, think of Pr(X|Y) as a function that accepts any

    xiDom[X] and ykDom[Y] and returns Pr(xi | yk). Note thatPr(X|Y) is not a single distribution; rather it denotes the family ofdistributions (over X) induced by the different ykDom(Y)

    Fahiem Bacchus, University of Toronto46

    E ploiting Conditional Independence

  • 7/31/2019 Lectures 06 Uncertainty

    47/154

    Exploiting Conditional Independence

    ` Lets see what conditional independence buys us

    `Consider a story:

    ` If Craig woke up too early E, Craig probably needscoffee C; if C, Craig needs coffee, he's likely angryA. If A, there is an increased chance of an

    aneurysm (burst blood vessel) B. If B, Craig is quitelikely to be hospitalized H.

    Fahiem Bacchus, University of Toronto47

    E C B HA

    E Craig woke too early A Craig is angry H Craig hospitalized

    C Craig needs coffee B Craig burst a blood vessel

    Condl Independence in our Story

  • 7/31/2019 Lectures 06 Uncertainty

    48/154

    Condl Independence in our Story

    `

    If you learned any of E, C, A, or B, your assessment of Pr(H)would change.

    ` E.g., if any of these are seen to be true, you would increasePr(h) and decrease Pr(~h).

    ` So H is not independent of E, or C, or A, or B.

    ` But if you knew value of B (true or false), learning value ofE, C, or A, would not influence Pr(H). Influence these

    factors have on H is mediated by their influence on B.` Craig doesn't get sent to the hospital because he's angry, he

    gets sent because he's had an aneurysm.

    ` So H is independent of E, and C, and A, given B

    Fahiem Bacchus, University of Toronto48

    E C B HA

    Condl Independence in our Story

  • 7/31/2019 Lectures 06 Uncertainty

    49/154

    Cond l Independence in our Story

    `Similarly:` B is independent of E, and C, given A

    ` A is independent of E, given C

    `This means that:` Pr(H | B, {A,C,E} ) = Pr(H|B)

    ` i.e., for any subset of {A,C,E}, this relation holds

    ` Pr(B | A, {C,E} ) = Pr(B | A)

    ` Pr(A | C, {E} ) = Pr(A | C)

    ` Pr(C | E) and Pr(E) dont simplify

    Fahiem Bacchus, University of Toronto49

    E C B HA

  • 7/31/2019 Lectures 06 Uncertainty

    50/154

    Example Quantification

  • 7/31/2019 Lectures 06 Uncertainty

    51/154

    Example Quantification

    ` Specifying the joint requires only 9 parameters (if we

    note that half of these are 1 minus the others),instead of 31 for explicit representation

    ` linear in number of vars instead of exponential!` linear generally if dependence has a chain structure

    Fahiem Bacchus, University of Toronto51

    E C B HA

    Pr(c|e) = 0.9Pr(~c|e) = 0.1Pr(c|~e) = 0.5Pr(~c|~e) = 0.5

    Pr(e) = 0.7Pr(~e) = 0.3

    Pr(a|c) = 0.3Pr(~a|c) = 0.7

    Pr(a|~c) = 1.0Pr(~a|~c) = 0.0

    Pr(h|b) = 0.9Pr(~h|b) = 0.1

    Pr(h|~b) = 0.1Pr(~h|~b) = 0.9

    Pr(b|a) = 0.2Pr(~b|a) = 0.8Pr(b|~a) = 0.1Pr(~b|~a) = 0.9

    Inference is Easy

  • 7/31/2019 Lectures 06 Uncertainty

    52/154

    Inference is Easy

    `

    Want to know P(a)? Use summing out rule:

    Fahiem Bacchus, University of Toronto52

    E C B HA

    )Pr()|Pr()|Pr(

    )Pr()|Pr()(

    )()(

    )(

    iEDom

    iiCDomc

    i

    i

    CDomc

    i

    eecca

    ccaaP

    iei

    i

    =

    =

    These are all terms specified in our local distributions!

    Inference is Easy

  • 7/31/2019 Lectures 06 Uncertainty

    53/154

    Inference is Easy

    `Computing P(a) in more concrete terms:

    ` P(c) = P(c|e)P(e) + P(c|~e)P(~e)

    = 0.8 * 0.7 + 0.5 * 0.3 = 0.78

    ` P(~c) = P(~c|e)P(e) + P(~c|~e)P(~e) = 0.22

    ` P(~c) = 1 P(c), as well

    ` P(a) = P(a|c)P(c) + P(a|~c)P(~c)

    = 0.7 * 0.78 + 0.0 * 0.22 = 0.546

    ` P(~a) = 1 P(a) = 0.454

    Fahiem Bacchus, University of Toronto53

    E C B HA

    Bayesian Networks

  • 7/31/2019 Lectures 06 Uncertainty

    54/154

    Bayesian Networks

    `The structure above is a Bayesian network. A BNis a graphical representation of the directdependencies over a set of variables, together

    with a set of conditional probability tablesquantifying the strength of those influences.

    `Bayes nets generalize the above ideas in veryinteresting ways, leading to effective means of

    representation and inference under uncertainty.

    Fahiem Bacchus, University of Toronto54

    Bayesian Networks

  • 7/31/2019 Lectures 06 Uncertainty

    55/154

    Bayesian Networks

    ` A BN over variables {X1, X2,, Xn} consists of:` a DAG (directed acyclic graph) whose nodes are the

    variables

    ` a set of CPTs (conditional probability tables) Pr(Xi | Par(Xi))for each Xi

    ` Key notions (see text for defns, all are intuitive):

    ` parents of a node: Par(Xi)` children of node

    ` descendents of a node

    ` ancestors of a node

    ` family: set of nodes consisting of Xi and its parents

    ` CPTs are defined over families in the BN

    Fahiem Bacchus, University of Toronto55

    Example (Binary valued Variables)

  • 7/31/2019 Lectures 06 Uncertainty

    56/154

    Example (Binary valued Variables)

    ` A coupleCPTS areshown

    ` Explicit jointrequires211 -1 =2047parmeters

    ` BN requiresonly 27parmeters(the numberof entries foreach CPT islisted)

    Fahiem Bacchus, University of Toronto56

    Semantics of Bayes Nets

  • 7/31/2019 Lectures 06 Uncertainty

    57/154

    Semantics of Bayes Nets.

    `A Bayes net specifies that the joint distributionover the variable in the net can be written as thefollowing product decomposition.

    `Pr(X1, X2,, Xn)= Pr(Xn | Par(Xn)) * Pr(Xn-1 | Par(Xn-1))

    *L * Pr(X1| Par(X1))

    `This equation hold for any set of values d1, d2,,

    dn for the variables X1, X2,, Xn.

    Fahiem Bacchus, University of Toronto57

    Semantics of Bayes Nets.

  • 7/31/2019 Lectures 06 Uncertainty

    58/154

    Semantics of Bayes Nets.

    ` E.g., say we have X1, X2, X3 each with domainDom[Xi] = {a, b, c} and we have

    Pr(X1,X2,X3)

    = P(X3|X2) P(X2) P(X1)Then

    Pr(X1=a,X2=a,X3=a)

    = P(X3=a|X2=a) P(X2=a) P(X1=a)Pr(X1=a,X2=a,X3=b)= P(X3=b|X2=a) P(X2

    =a) P(X1=a)Pr(X1=a,X2=a,X3=c)

    = P(X3=c|X2=a) P(X2=a) P(X1=a)Pr(X1=a,X2=b,X3=a)= P(X3=a|X2=b) P(X2

    =b) P(X1=a)

    Fahiem Bacchus, University of Toronto58

  • 7/31/2019 Lectures 06 Uncertainty

    59/154

    Semantics of Bayes Nets.

  • 7/31/2019 Lectures 06 Uncertainty

    60/154

    S y

    `Note that this means we can compute theprobability of any setting of the variables usingonly the information contained in the CPTs of the

    network.

    Fahiem Bacchus, University of Toronto60

    Constructing a Bayes Net

  • 7/31/2019 Lectures 06 Uncertainty

    61/154

    g y

    ` It is always possible to construct a Bayes net torepresent any distribution over the variables X1,X2,, Xn, using any ordering of the variables.

    Fahiem Bacchus, University of Toronto61

    Take any ordering of the variables (say, the order given). From the

    chain rule we obtain.

    Pr(X1,,Xn) = Pr(Xn|X1,,Xn-1)Pr(Xn-1|X1,,Xn-2)Pr(X1)

    Now for each Xi go through its conditioning set X1,,Xi-1, and

    iteratively remove all variables Xj such that Xi is conditionallyindependent of Xj given the remaining variables. Do this until no

    more variables can be removed.

    The final product will specify a Bayes net.

    Constructing a Bayes Net

  • 7/31/2019 Lectures 06 Uncertainty

    62/154

    g y

    ` The end result will be a productdecomposition/Bayes netPr(Xn | Par(Xn)) Pr(Xn-1 | Par(Xn-1)) Pr(X1)

    ` Now we specify the numeric values associated with each termPr(Xi | Par(Xi)) in a CPT.

    ` Typically we represent the CPT as a table mapping each settingof {Xi,Par(Xi)} to the probability of Xi taking that particular valuegiven that the variables in Par(X

    i

    ) have their specified values.

    ` If each variable has d different values.

    ` We will need a table of size d|{Xi,Par(Xi)}|.` That is, exponential in the size of the parent set.

    ` Note that the original chain rulePr(X1,,Xn) = Pr(Xn|X1,,Xn-1)Pr(Xn-1|X1,,Xn-2)Pr(X1)requires as much space to represent as specifying theprobability of each individual event.

    Fahiem Bacchus, University of Toronto62

    Causal Intuitions

  • 7/31/2019 Lectures 06 Uncertainty

    63/154

    ` The BN can be constructed using an arbitraryordering of the variables.

    ` However, some orderings will yield BNs with very

    large parent sets. This requires exponential space,and (as we will see later) exponential time toperform inference.

    ` Empirically, and conceptually, a good way toconstruct a BN is to use an ordering based oncausality. This often yields a more natural and

    compact BN.

    Fahiem Bacchus, University of Toronto63

    Causal Intuitions

  • 7/31/2019 Lectures 06 Uncertainty

    64/154

    ` Malaria, the flu and a cold all cause aches. So usethe ordering that causes come before effects

    Malaria, Flu, Cold, Aches

    Pr(M,F,C,A) = Pr(A|M,F,C) Pr(C|M,F) Pr(F|M) Pr(M)

    ` Each of these disease affects the probability of

    aches, so the first conditional probability does notchange.

    ` It is reasonable to assume that these diseases are

    independent of each other: having or not havingone does not change the probability of having theothers. So Pr(C|M,F) = Pr(C)Pr(F|M) = Pr(F)

    Fahiem Bacchus, University of Toronto64

  • 7/31/2019 Lectures 06 Uncertainty

    65/154

  • 7/31/2019 Lectures 06 Uncertainty

    66/154

  • 7/31/2019 Lectures 06 Uncertainty

    67/154

  • 7/31/2019 Lectures 06 Uncertainty

    68/154

  • 7/31/2019 Lectures 06 Uncertainty

    69/154

    Burglary Example

  • 7/31/2019 Lectures 06 Uncertainty

    70/154

    Fahiem Bacchus, University of Toronto70

    # of Params:1 + 1 + 4 + 2 + 2 = 10 (vs. 25-1 = 31)

    A burglary can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

    Example of Constructing Bayes Network

  • 7/31/2019 Lectures 06 Uncertainty

    71/154

    ` Suppose we choose the orderingM, J, A, B, E

    P(J | M) = P(J)?

    Fahiem Bacchus, University of Toronto71

    Example continue

  • 7/31/2019 Lectures 06 Uncertainty

    72/154

    ` Suppose we choose the orderingM, J, A, B, E

    P(J | M) = P(J)? No

    P(A | J, M) = P(A | J)? P(A | J, M) = P(A)?

    Fahiem Bacchus, University of Toronto72

    Example continue

  • 7/31/2019 Lectures 06 Uncertainty

    73/154

    ` Suppose we choose the orderingM, J, A, B, E

    P(J | M) = P(J)? No

    P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)?

    P(B | A, J, M) = P(B)?

    Fahiem Bacchus, University of Toronto73

    Example continue

  • 7/31/2019 Lectures 06 Uncertainty

    74/154

    ` Suppose we choose the ordering M, J, A, B, E

    P(J | M) = P(J)? No

    P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? Yes

    P(B | A, J, M) = P(B)? No

    P(E | B, A ,J, M) = P(E | A)?

    P(E | B, A, J, M) = P(E | A, B)?

    Fahiem Bacchus, University of Toronto74

    Example continue

  • 7/31/2019 Lectures 06 Uncertainty

    75/154

    ` Suppose we choose the ordering M, J, A, B, E

    P(J | M) = P(J)? No

    P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? Yes

    P(B | A, J, M) = P(B)? No

    P(E | B, A ,J, M) = P(E | A)? No

    P(E | B, A, J, M) = P(E | A, B)? Yes

    Fahiem Bacchus, University of Toronto75

  • 7/31/2019 Lectures 06 Uncertainty

    76/154

    Inference in Bayes Nets

  • 7/31/2019 Lectures 06 Uncertainty

    77/154

    ` Given a Bayes netPr(X1, X2,, Xn)= Pr(Xn | Par(Xn)) * Pr(Xn-1 | Par(Xn-1))

    *L

    * Pr(X1| Par(X1))` And some evidence E = {a set of values for

    some of the variables} we want to compute the

    new probability distributionPr(Xk| E)

    ` That is, we want to figure our Pr(X_k = d |E) for

    all d Dom[Xk]

    77 Fahiem Bacchus, University of Toronto

    Inference in Bayes Nets

  • 7/31/2019 Lectures 06 Uncertainty

    78/154

    ` Other types of examples are, computingprobability of different diseases given symptoms,computing probability of hail storms given

    different metrological evidence, etc.` In such cases getting a good estimate of the

    probability of the unknown event allows us to

    respond more effectively (gamble rationally)

    78 Fahiem Bacchus, University of Toronto

    Inference in Bayes Nets

  • 7/31/2019 Lectures 06 Uncertainty

    79/154

    ` In the Alarm example we have

    Pr(Breakin,Earthquake,Radio,Sound) =

    Pr(Earthquake) * Pr(BreakIn) *Pr(Radio|Earthquake) *Pr(Sound|BreakIn,Earthquake)

    ` And, e.g., we want to compute things likePr(BreakIn=True| Radio=false, Sound=true)

    79 Fahiem Bacchus, University of Toronto

    Variable Elimination

  • 7/31/2019 Lectures 06 Uncertainty

    80/154

    ` Variable elimination uses the productdecomposition and the summing out rule tocompute posterior probabilities from the

    information (CPTs) already in the network.

    80 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    81/154

    Example

  • 7/31/2019 Lectures 06 Uncertainty

    82/154

    Pr(A,B,C,D,E,F,G,H,I,J,K) =Pr(A)Pr(B)Pr(C|A)Pr(D|A,B)Pr(E|C)Pr(F|D)Pr(G)

    Pr(H|E,F)Pr(I|F,G)Pr(J|H,I)Pr(K|I)

    Say that E = {H=true, I=false}, and we want to knowPr(D|h,i) (h: H is true, -h: H is false)

    1. Write as a sum for each value of D

    A,B,C,E,F,G,J,K Pr(A,B,C,d,E,F,h,-i,J,K)= Pr(d,h,-i)

    A,B,C,E,F,G,J,K Pr(A,B,C,-d,E,F,h,-i,J,K)= Pr(-d,h,-i)

    82 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    83/154

    Example

  • 7/31/2019 Lectures 06 Uncertainty

    84/154

    Pr(d,h,-i) = A,B,C,E,F,G,J,K Pr(A,B,C,d,E,F,h,-i,J,K)

    Use Bayes Net product decomposition to rewrite

    summation:

    A,B,C,E,F,G,J,K Pr(A,B,C,d,E,F,h,-i,J,K)

    = A,B,C,E,F,G,J,K Pr(A)Pr(B)Pr(C|A)Pr(d|A,B)Pr(E|C)Pr(F|d)Pr(G)Pr(h|E,F)Pr(-i|F,G)Pr(J|h,-i)Pr(K|-i)

    Now rearrange summations so that we are notsumming over that do not depend on the summedvariable.

    84 Fahiem Bacchus, University of Toronto

    Example

  • 7/31/2019 Lectures 06 Uncertainty

    85/154

    = A,B,C,E,F,G,J,KPr(A)Pr(B)Pr(C|A)Pr(d|A,B)Pr(E|C)Pr(F|d)Pr(G)Pr(h|E,F)Pr(-i|F,G)Pr(J|h,-i)

    Pr(K|-i)= A Pr(A) B Pr(B) C Pr(C|A)Pr(d|A,B) E Pr(E|C)

    F Pr(F|d) G Pr(G)Pr(h|E,F)Pr(-i|F,G) J Pr(J|h,-i)

    K Pr(K|-i)

    = A Pr(A) B Pr(B) Pr(d|A,B) C Pr(C|A) E Pr(E|C)

    F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G) J Pr(J|h,-i)K Pr(K|-i)

    85 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    86/154

    Example

    ` N t t ti

  • 7/31/2019 Lectures 06 Uncertainty

    87/154

    ` Now start computing.A Pr(A) B Pr(B) Pr(d|A,B) C Pr(C|A) E Pr(E|C)

    F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)

    J Pr(J|h,-i)K Pr(K|-i)

    K Pr(K|-i) = Pr(k|-i) + Pr(-k|-i) = c1

    JPr(J|h,-i) c1 = c1 JPr(J|h,-i)

    = c1 (Pr(j|h,-i) + Pr(-j|h,-i))= c1c2

    87 Fahiem Bacchus, University of Toronto

    Example

    `

  • 7/31/2019 Lectures 06 Uncertainty

    88/154

    `

    A Pr(A) B Pr(B) Pr(d|A,B) C Pr(C|A) E Pr(E|C)F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)

    J Pr(J|h,-i)K Pr(K|-i)

    c1c2 G Pr(G) Pr(-i|F,G)= c1c2(Pr(g)Pr(-i|F,g) + Pr(-g)Pr(-i|F,-g))

    !!But Pr(-i|F,g) depends on the value of F, sothis is not a single number.

    88 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    89/154

    Example

  • 7/31/2019 Lectures 06 Uncertainty

    90/154

    =Pr(a)Pr(b) Pr(d|a,b) C Pr(C|a) E Pr(E|C)F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)

    +Pr(a)Pr(-b) Pr(d|a,-b) C Pr(C|a) E Pr(E|C)

    F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)

    +Pr(-a)Pr(b) Pr(d|-a,b) C Pr(C|-a) E Pr(E|C)

    F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)

    +Pr(-a)Pr(-b) Pr(d|-a,-b) C Pr(C|-a) E Pr(E|C)

    F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)

    90 Fahiem Bacchus, University of Toronto

    Example

  • 7/31/2019 Lectures 06 Uncertainty

    91/154

    =Yikes! The size of the sum is doubling as weexpand each variable (into v and v). This

    approach has exponential complexity.

    But lets look a bit closer.

    91 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    92/154

  • 7/31/2019 Lectures 06 Uncertainty

    93/154

  • 7/31/2019 Lectures 06 Uncertainty

    94/154

  • 7/31/2019 Lectures 06 Uncertainty

    95/154

  • 7/31/2019 Lectures 06 Uncertainty

    96/154

  • 7/31/2019 Lectures 06 Uncertainty

    97/154

    Variable Elimination (VE)`

    VE works from the inside out, summing out K, then J, then G,as we tried to before

  • 7/31/2019 Lectures 06 Uncertainty

    98/154

    VE works from the inside out, summing out K, then J, then G,, as we tried to before.

    ` When we tried to sum out G

    A Pr(A) B Pr(B) Pr(d|A,B) C Pr(C|A) E Pr(E|C)F Pr(F|d) Pr(h|E,F)G Pr(G) Pr(-i|F,G)J Pr(J|h,-i)K Pr(K|-i)

    c1c2 G Pr(G) Pr(-i|F,G)= c1c2(Pr(g)Pr(-i|F,g) + Pr(-g)Pr(-i|F,-g))

    we found that Pr(-i|F,-g) depends on the value of F, it wasnt a

    single number.` However, we can still continue with the computation by

    computing two different numbers, one for each value of F(-f, f)!

    98 Fahiem Bacchus, University of Toronto

    Variable Elimination (VE)

    ` t(-f) = c1c2 G Pr(G) Pr(-i|-f G)

  • 7/31/2019 Lectures 06 Uncertainty

    99/154

    ` t( f) c c Pr(G) Pr( i| f,G)

    t(f) = c1c2(G Pr(G) Pr(-i|f,G)

    ` t(-f) = c1c2(Pr(g)Pr(-i|-f,g) + Pr(-g)Pr(-i|-f,-g))

    ` t(-f) = c1c2(Pr(g)Pr(-i|f,g) + Pr(-g)Pr(-i|f,-g))

    ` Now we sum out F

    99 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    100/154

    Variable Elimination (VE)

    ` c1c2(Pr(f|d) Pr(h|E,f)t(f)

  • 7/31/2019 Lectures 06 Uncertainty

    101/154

    c c (Pr(f|d) Pr(h|E,f)t(f)+Pr(-f|d)Pr(h|E,-f)t(-f)

    ` This is a function of E, so we obtain two newnumbers

    s(e) = c1c2(Pr(f|d) Pr(h|e,f)t(f)+Pr(-f|d)Pr(h|e,-f)t(-f)

    s(-e) = c1c2(Pr(f|d) Pr(h|-e,f)t(f)+Pr(-f|d)Pr(h|-e,-f)t(-f)

    101 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    102/154

  • 7/31/2019 Lectures 06 Uncertainty

    103/154

  • 7/31/2019 Lectures 06 Uncertainty

    104/154

  • 7/31/2019 Lectures 06 Uncertainty

    105/154

    The Product of Two Factors

    `Let f(X,Y) & g(Y,Z) be two factors with variables Yin common

  • 7/31/2019 Lectures 06 Uncertainty

    106/154

    ( ) g( )in common

    `The product of f and g, denoted h = f x g (or

    sometimes just h = fg), is defined:h(X,Y,Z) = f(X,Y) x g(Y,Z)

    106

    f(A,B) g(B,C) h(A,B,C)

    ab 0.9 bc 0.7 abc 0.63 ab~c 0.27

    a~b 0.1 b~c 0.3 a~bc 0.08 a~b~c 0.02

    ~ab 0.4 ~bc 0.8 ~abc 0.28 ~ab~c 0.12

    ~a~b 0.6 ~b~c 0.2 ~a~bc 0.48~a~b~

    c0.12

    Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    107/154

    Restricting a Factor

    `Let f(X,Y) be a factor with variable X (Y is a set)W t i t f t f t X b tti X t th

  • 7/31/2019 Lectures 06 Uncertainty

    108/154

    ( ) ( )`Werestrict factor f to X=a by setting X to thevalue x and deleting incompatible elements

    of fs domain . Define h = fX=a as: h(Y) = f(a,Y)

    108

    f(A,B)h(B) =

    fA=aab 0.9 b 0.9

    a~b 0.1 ~b 0.1

    ~ab 0.4

    ~a~b 0.6

    Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    109/154

    VE: Example

    Factors: f1(A) f2(B) f3(A,B,C)f4(C,D)C D

    Af1(A)

  • 7/31/2019 Lectures 06 Uncertainty

    110/154

    Restriction: replace f4(C,D) with f5(C) = f4(C,d)

    Step 1: Compute & Add f6(A,B)= C f5(C) f3(A,B,C)

    Remove: f3(A,B,C), f5(C)

    Step 2: Compute & Add f7(A) = B f6(A,B) f2(B)

    Remove: f6(A,B), f2(B)Last factors: f7(A), f1(A). The product f1(A) x f7(A) is (unnormalized)

    posterior. So P(A|d) = f1(A) x f7(A)where = 1/A f1(A)f7(A)

    110

    ( )

    Query: P(A)?

    Evidence: D = d

    Elim. Order: C, B

    C D

    f3(A,B,C)f4(C,D)Bf2(B)

    Fahiem Bacchus, University of Toronto

    Numeric Example

    `Heres the example with some numbers

  • 7/31/2019 Lectures 06 Uncertainty

    111/154

    111

    B CA

    f1(A)f2(A,B) f3(B,C)

    f1(A) f 2(A,B) f 3(B,C)f4(B)

    A f2(A,B)f1(A)

    f5(C)

    B f3(B,C) f4(B)a 0.9 ab 0.9 bc 0.7 b 0.85 c 0.625

    ~a 0.1 a~b 0.1 b~c 0.3 ~b 0.15 ~c 0.375

    ~ab 0.4 ~bc 0.2

    ~a~b 0.6 ~b~c 0.8

    Fahiem Bacchus, University of Toronto

    VE: Buckets as a Notational Device

    A Ef5(C,E)

  • 7/31/2019 Lectures 06 Uncertainty

    112/154

    112

    C

    D

    Af1(A)

    f3(A,B,C)

    f4(C,D)

    Bf2(B)

    EF

    f6E,D,F)

    Ordering:C,F,A,B,E,D

    1. C:

    2. F:

    3. A:

    4. B:5. E:

    6. D: Fahiem Bacchus, University of Toronto

    VE: BucketsPlace Original Factors infirst applicable bucket.

    A Ef5(C,E)

    d

  • 7/31/2019 Lectures 06 Uncertainty

    113/154

    113

    C

    D

    Af1(A)

    f3(A,B,C)

    f4(C,D)

    Bf2(B)

    EF

    f6(E,D,F)

    1. C: f3(A,B,C), f4(C,D), f5(C,E)

    2. F: f6(E,D,F)

    3. A: f1(A)

    4. B: f2(B)5. E:

    6. D:

    Ordering:C,F,A,B,E,D

    Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    114/154

    VE: Eliminate the variables in order, placingnew factor in first applicable bucket.

    A Ef5(C,E)

    O d i

  • 7/31/2019 Lectures 06 Uncertainty

    115/154

    115

    C

    D

    Af1(A)

    f3(A,B,C)

    f4(C,D)

    Bf2(B)

    EF

    f6(E,D,F)

    1. C: f3(A,B,C), f4(C,D), f5(C,E)

    2. F: f6(E,D,F)

    3. A: f1(A), f7(A,B,D,E)

    4. B: f2(B)5. E: f8(E,D)

    6. D:

    2. F f6(E,D,F) = f8(E,D)

    Ordering:C,F,A,B,E,D

    Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    116/154

    VE: Eliminate the variables in order, placingnew factor in first applicable bucket.

    Af (A) Ef5(C,E)

    O d in :

  • 7/31/2019 Lectures 06 Uncertainty

    117/154

    117

    C

    D

    Af1(A)

    f3(A,B,C)

    f4(C,D)

    Bf2(B)

    EF

    f6(E,D,F)

    1. C: f3(A,B,C), f4(C,D), f5(C,E)

    2. F: f6(E,D,F)

    3. A: f1(A), f7(A,B,D,E)

    4. B: f2(B), f9(B,D,E)5. E: f8(E,D), f10(D,E)

    6. D:

    4. B f2(B), f9(B,D,E)= f10(D,E)

    Ordering:C,F,A,B,E,D

    Fahiem Bacchus, University of Toronto

    VE: Eliminate the variables in order, placingnew factor in first applicable bucket.

    Af (A) Ef5(C,E)

    Ordering:

  • 7/31/2019 Lectures 06 Uncertainty

    118/154

    118

    C

    D

    Af1(A)

    f3(A,B,C)

    f4(C,D)

    Bf2(B)

    EF

    f6(E,D,F)

    1. C: f3(A,B,C), f4(C,D), f5(C,E)

    2. F: f6(E,D,F)

    3. A: f1(A), f7(A,B,D,E)

    4. B: f2(B), f9(B,D,E)5. E: f8(E,D), f10(D,E)

    6. D: f11

    (D)

    5. E f8(E,D), f10(D,E)= f11(D)

    f11 is he final answer, once wenormalize it.

    Ordering:C,F,A,B,E,D

    Fahiem Bacchus, University of Toronto

    Complexity of Variable Elimination

    ` Hypergraph of Bayes Net.` Hypergraph has vertices just like an ordinary graph,

  • 7/31/2019 Lectures 06 Uncertainty

    119/154

    yp g p j y g pbut instead of edges between two vertices XY it

    contains hyperedges.` A hyperedge is a set of vertices (i.e., potentially more than

    one)

    119

    A

    B

    C

    D

    E{A,B,D}{B,C,D}

    {E,D}

    Fahiem Bacchus, University of Toronto

    Complexity of Variable Elimination

    ` Hypergraph of Bayes Net.` The set of vertices are precisely the nodes of the

  • 7/31/2019 Lectures 06 Uncertainty

    120/154

    p yBayes net.

    ` The hyperedges are the variables appearing in eachCPT.

    ` {Xi} Par(Xi)

    120 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    121/154

    Variable Elimination in the HyperGraph

    ` To eliminate variable Xi in the hypergraph we` we remove the vertex Xi

  • 7/31/2019 Lectures 06 Uncertainty

    122/154

    ` Create a new hyperedge Hi equal to the union of all

    of the hyperedges that contain Xi minus Xi` Remove all of the hyperedges containing X from the

    hypergraph.

    `

    Add the new hyperedge Hi to the hypergraph.

    122 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    123/154

    Complexity of Variable Elimination

    CA E

    F

  • 7/31/2019 Lectures 06 Uncertainty

    124/154

    ` Eliminate D

    124

    C

    DB

    F

    C

    A

    B

    E

    F

    Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    125/154

    Variable Elimination

    `

    Notice that when we start VE we have a set offactors consisting of the reduced CPTs. Theunassigned variables for the vertices and the set of

  • 7/31/2019 Lectures 06 Uncertainty

    126/154

    unassigned variables for the vertices and the set ofvariables each factor depends on forms thehyperedges of a hypergraph H1.

    ` If the first variable we eliminate is X, then we removeall factors containing X (all hyperedges) and add anew factor that has as variables the union of thevariables in the factors containing X (we add a

    hyperdege that is the union of the removedhyperedges minus X).

    126 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    127/154

  • 7/31/2019 Lectures 06 Uncertainty

    128/154

  • 7/31/2019 Lectures 06 Uncertainty

    129/154

  • 7/31/2019 Lectures 06 Uncertainty

    130/154

  • 7/31/2019 Lectures 06 Uncertainty

    131/154

    VE: Eliminate B, placing new factor f10 infirst applicable bucket.

    Af1(A) E

    f5(C,E)

    FOrdering:

  • 7/31/2019 Lectures 06 Uncertainty

    132/154

    132

    C

    Df3(A,B,C)f4(C,D)

    Bf2(B)

    F

    f6(E,D,F)

    1. C: f3(A,B,C), f4(C,D), f5(C,E)

    2. F: f6(E,D,F)

    3. A: f1(A), f7(A,B,D,E)

    4. B: f2(B), f9(B,D,E)5. E: f8(E,D), f10(D,E)

    6. D:

    C,F,A,B,E,D

    D

    E

    Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    133/154

  • 7/31/2019 Lectures 06 Uncertainty

    134/154

    Elimination Width

    ` If the elimination width of an ordering is k, then thecomplexity of VE using that ordering is 2O(k)

    ` Elimination width k means that at some stage in the

  • 7/31/2019 Lectures 06 Uncertainty

    135/154

    gelimination process a factor involving k variables

    was generated.` That factor will require 2O(k) space to store

    ` space complexity of VE is 2O(k)

    ` And it will require 2O(k) operations to process (eitherto compute in the first place, or when it is beingprocessed to eliminate one of its variables).` Time complexity of VE is 2O(k)

    ` NOTE, that k is the elimination width of this particularordering.

    135 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    136/154

    Tree Width

    ` Exponential in the tree width is the best that VE cando.` Finding an ordering that has elimination width equal to tree

    idth is NP Hard

  • 7/31/2019 Lectures 06 Uncertainty

    137/154

    width is NP-Hard.

    `

    so in practice there is no point in trying to speed up VE byfinding the best possible elimination ordering.

    ` Heuristics are used to find orderings with good (low)elimination widths.

    ` In practice, this can be very successful. Elimination widthscan often be relatively small, 8-10 even when the networkhas 1000s of variables.

    ` Thus VE can be much!! more efficient than simply summing

    the probability of all possible events (which is exponentialin the number of variables).

    ` Sometimes, however, the treewidth is equal to the numberof variables.

    137 Fahiem Bacchus, University of Toronto

    Finding Good Orderings

    `

    A polytrees is a singly connected Bayes Net: inparticularthere is only one path between anytwo nodes.

  • 7/31/2019 Lectures 06 Uncertainty

    138/154

    two nodes.

    ` A node can have multiple parents, but we haveno cycles.

    ` Good orderings are easy to find for polytrees

    ` At each stage eliminate a singly c onnec ted node.` Because we have a polytree we are assured that a

    singly connected node will exist at each elimination

    stage.` The size of the factors in the tree never increase.

    138 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    139/154

    Effect of Different Orderings

    `

    Suppose query variableis D. Consider differentorderings for this network

  • 7/31/2019 Lectures 06 Uncertainty

    140/154

    g

    (not a polytree!)` A,F,H,G,B,C,E:

    ` good

    `

    E,C,A,B,G,H,F:` bad

    140 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    141/154

    Relevance

    `Certain variables have no impact on the query.

    B CA

  • 7/31/2019 Lectures 06 Uncertainty

    142/154

    p q y

    In network ABC, computing Pr(A) with noevidence requires elimination of B and C.

    ` But when you sum out these vars, you compute a

    trivial factor (whose value are all ones); forexample:

    ` eliminating C: f4(B) = C f3(B,C) = C Pr(C|B)

    ` 1 for any value of B (e.g., Pr(c|b) + Pr(~c|b) = 1)`No need to think about B or C for this query

    142 Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    143/154

    Relevance: Examples

    ` Query: P(F)` relevant: F, C, B, A

    ` Query: P(F|E)l t F C B A

    C

    A B

    G

  • 7/31/2019 Lectures 06 Uncertainty

    144/154

    ` relevant: F, C, B, A

    ` also: E, henc e D, G` intuitively, we need to compute

    P(C|E) to compute P(F|E)

    ` Query: P(F|H)` relevant F,C,A,B.

    Pr(A)Pr(B)Pr(C|A,B)Pr(F|C) Pr(G)Pr(h|G)Pr(D|G,C)Pr(E|D)= Pr(G)Pr(h|G)Pr(D|G,C) EPr(E|D) = a table of 1s

    = Pr(G)Pr(h|G) D Pr(D|G,C) = a table of 1s= [Pr(A)Pr(B)Pr(C|A,B)Pr(F|C)] [Pr(G)Pr(h|G)][Pr(G)Pr(h|G)] 1 but irrelevant

    once we normalize, multiplies each value ofF equally

    144

    D

    E

    FH

    Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    145/154

  • 7/31/2019 Lectures 06 Uncertainty

    146/154

  • 7/31/2019 Lectures 06 Uncertainty

    147/154

  • 7/31/2019 Lectures 06 Uncertainty

    148/154

  • 7/31/2019 Lectures 06 Uncertainty

    149/154

  • 7/31/2019 Lectures 06 Uncertainty

    150/154

  • 7/31/2019 Lectures 06 Uncertainty

    151/154

    D-Separation: Intuitions

    Flu and Mal are indep.(given no evidence):

  • 7/31/2019 Lectures 06 Uncertainty

    152/154

    152

    Fever blocks thepath, since it is not inevidence, nor is itsdecsendant Therm.

    Flu and Mal aredependent givenFever (or given

    Therm): nothingblocks path now.Whats the intuition?

    Fahiem Bacchus, University of Toronto

  • 7/31/2019 Lectures 06 Uncertainty

    153/154

  • 7/31/2019 Lectures 06 Uncertainty

    154/154