announcements

CS 484 – Artificial Intelligence 1

Announcements

• Homework 8 due today, November 13• ½ to 1 page description of final project due

Thursday, November 15• Current Events

• Christian - now• Jeff - Thursday

• Research Paper due Tuesday, November 20

Probabilistic Reasoning

Lecture 15


Probabilistic Reasoning

• Logic deals with certainties• A → B

• Probabilities are expressed in a notation similar to that of predicates in First Order Predicate Calculus:• P(R) = 0.7

• P(S) = 0.1

• P(¬(A Λ B) V C) = 0.2

• 1 = certain; 0 = certainly not


What's the probability that either A is true or B is true?

A Λ

B

A B

P(A V B) =

Venn Diagram


Conditional Probability

• Conditional probability refers to the probability of one thing given that we already know another to be true:

• This states the probability of B, given A.

)(

)()|(

AP

ABPABP

∧=

A Λ

B

A B


Calculate

• P(R|S) given that the probability of rain is 0.7, the probability of sun is 0.1 and the probability of rain and sun is 0.01

• P(R|S) =

• Note: P(A|B) ≠ P(B|A)


Joint Probability Distributions• A joint probability distribution represents the combined

probabilities of two or more variables.

• This table shows, for example, that P (A Λ B) = 0.11 P (¬A Λ B) = 0.09

• Using this, we can calculate P(A):P(A) = P(A Λ B) + P(A Λ ¬B)

= 0.11 + 0.63= 0.74

A ⌐A

B 0.11 0.09

⌐B 0.63 0.17

A Λ

BA B


Bayes’ Theorem

• Bayes’ theorem lets us calculate a conditional probability:

• P(B) is the prior probability of B.• P(B | A) is the posterior probability of B.

)(

)()|()|(

AP

BPBAPABP

⋅=


Bayes' Theorem Deduction

• Recall: )(

)()|(

AP

ABPABP

∧=


Medical Diagnosis• Data

• 80% of the time you have a cold, you also have a high temperature.

• At any one time, 1 in every 10,000 people has a cold

• 1 in every 1000 people has a high temperature

• Suppose you have a high temperature. What is the likelihood that you have a cold?


Witness Reliability• A hit-and-run incident has been reported, and an

eye witness has stated she is certain that the car was a white taxi.

• How likely is she right?• Facts:

• Yellow taxi company has 90 cars• White taxi company has 10 cars• Expert says that given the foggy weather, the witness

has 75% chance of correctly identifying the taxi


Witness Reliability – Prior Probability• Imagine lady shown a sequence of 1000 cars

• Expect 900 to be yellow and 100 to be white

• Given 75% accuracy, how many will she say are white and yellow• Of 900 yellow cars, says yellow and says white

• Of 100 yellow cars, says yellow and says white

• What is the probability women says white?

• How likely is she right?


Comparing Conditional Probabilities• Medical diagnosis

• Probability of cold (C) is 0.0001

• P(HT|C) = 0.8

• Probability of plague (P) is 0.000000001

• P(HT|P) = 0.99

• Relative likelihood of cold and plague

)(*)|(

)(*)|(

)|(

)|(

)(

)()|()|(

)(

)(*)|()|(

PPPHTP

CPCHTP

HTPP

HTCP

HTP

PPPHTPHTPP

HTP

CPCHTPHTCP

=

∗==


Simple Bayesian Concept Learning (1)

• P (H|E) is used to represent the probability that some hypothesis, H, is true, given evidence E.

• Let us suppose we have a set of hypotheses H1…Hn.

• For each Hi • Hence, given a piece of evidence, a learner can

determine which is the most likely explanation by finding the hypothesis that has the highest posterior probability.

)(

)()|()|(

EP

HPHEPEHP ii

i

⋅=


Simple Bayesian Concept Learning (2)

• In fact, this can be simplified. • Since P(E) is independent of Hi it will have the same

value for each hypothesis. • Hence, it can be ignored, and we can find the

hypothesis with the highest value of:

• We can simplify this further if all the hypotheses are equally likely, in which case we simply seek the hypothesis with the highest value of P(E|Hi).

• This is the likelihood of E given Hi.

)()|( ii HPHEP ⋅


Bayesian Belief Networks (1)

• A belief network shows the dependencies between a group of variables.

• If two variables A and B are independent if the likelihood that A will occur has nothing to do with whether B occurs.

• C and D are dependent on A; D and E are dependent on B.

• The Bayesian belief network has probabilities associated with each link. E.g., P(C|A) = 0.2, P(C|¬A) = 0.4



• A complete set of probabilities for this belief network might be:• P(A) = 0.1• P(B) = 0.7• P(C|A) = 0.2• P(C|¬A) = 0.4• P(D|A Λ B) = 0.5• P(D|A Λ ¬B) = 0.4• P(D|¬A Λ B) = 0.2• P(D|¬A Λ ¬B) = 0.0001• P(E|B) = 0.2• P(E|¬B) = 0.1



• We can now calculate conditional probabilities:

• In fact, we can simplify this, since there are no dependencies between certain pairs of variables – between E and A, for example. Hence:

)()|(),|(),,|(),,,|(),,,,(

),,,(),,,|(),,,,(

APABPBACPCBADPDCBAEPEDCBAP

DCBAPDCBAEPEDCBAP

⋅⋅⋅⋅=⋅=

)()()|(),|()|(),,,,( APBPACPBADPBEPEDCBAP ⋅⋅⋅⋅=


College Life Example• C = that you will go to college• S = that you will study• P = that you will party• E = that you will be successful in your exams• F = that you will have fun

C

S P

E F


College Life Example

C

S P

E F

P(C)

0.2

C P(S)

true 0.8

false 0.2

C P(P)

true 0.6

false 0.5

S P P(E)

true true 0.6

true false 0.9

false true 0.1

false false 0.2

P P(F)

true 0.9

false 0.7


College Example

• Using the tables to solve problems such as

P(C==true, S = true, P = false, E = true, F = false) ==

P(C,S, ¬P,E, ¬F)• General solution

∏=

=n

iin ExPxxP

11 )|(),,( K

)|()|()|()|()(

),,,,(

PFPPSEPCPPCSPCP

FEPSCP

¬¬⋅¬∧⋅¬⋅⋅=¬¬


Noisy-V Function• Want to assume know all reasons for a possible event

• E.g. Medical Diagnosis System• P(HT|C) = 0.8• P(HT|P) = 0.99• Assume P(HT|C V P) = 1 (?)

• Assumption clearly not true

• Leak node – represents all other causes• P(HT|O) = 0.9

• Define noise parameters – conditional probabilities for ¬HT• P(¬ HT|C) = 1 – P(HT|C) = 0.2• P(¬ HT|P) = • P(¬ HT|O) =

• Further assumption – the causes of a high temperature are independent of each other and the noisy parameters are independent


Noisy V-Function

• Benefit of Noisy V-Function• If cold, plague, and other is all false, P(¬HT) =

1• Otherwise, P(¬HT) is equal to product of the

noise parameters for all the variables that are true

• E.g. If plague and other is true and cold is false, P(HT) = 1 – (0.01 * 0.1) = 0.999

• Benefit – don’t need to store as many values as the Bayesian belief network


Bayes’ Optimal Classifier

• A system that uses Bayes’ theory to classify data.• We have a piece of data y, and are seeking the correct hypothesis from H1 …

H5, each of which assigns a classification to y.• The probability that y should be classified as cj is:

• x1 to xn are the training data, and m is the number of hypotheses.• This method provides the best possible classification for a piece of data.• Example: Given some date will classify it as true or false • P(true|x1,…,xn) =

• P(false|x1,…,xn) =

∑=

⋅=m

iniijnj xxhPhcPxxcP

111 )|()|()|( KK

P(H1| x1,…,xn) = 0.2 P(false|H1) = 0 P(true|H1) = 1






The Naïve Bayes Classifier (1)

• A vector of data is classified as a single classification.

p(ci| d1, …, dn)• The classification with the highest posterior probability is

chosen.• The hypothesis which has the highest posterior probability

is the maximum a posteriori, or MAP hypothesis. • In this case, we are looking for the MAP classification.• Bayes’ theorem is used to find the posterior probability:

),,(

)()|,,(

1

1

n

iin

ddP

cPcddP

KK ⋅


The Naïve Bayes Classifier (2)

• Since P(d1, …, dn) is a constant, independent of ci, we can eliminate it, and simply aim to find the classification ci, for which the following is maximised:

• We now assume that all the attributes d1, …, dn are independent

• So P(d1, …, dn|ci) can be rewritten as:

• The classification for which this is highest is chosen to classify the data.

)()|,,( 1 iin cPcddP ⋅K

∏=

⋅n

jiji cdPcP

1

)|()(


Classifier Examplex y z Classification

2 3 2 A

4 1 4 B

1 3 2 A

2 4 3 A

4 2 4 B

2 1 3 C

1 2 4 A

2 3 3 B

2 2 4 A

3 3 3 C

3 2 1 A

1 2 1 B

2 1 4 A

4 3 4 C

2 2 4 A

• New piece of data to classify• (x = 2, y = 3, z =4)

• Want P(ci|x=2,y=3,z=4)

• P(A) * P(x=2|A) * P(y=3|A) * P(z=4|A)

• P(B) * P(x=2|B) * P(y=3|B) * P(z=4|B)

Training Data

0417.08

4

8

2

8

5

15

8=⋅⋅⋅


M-estimate• Problem with too little training data

• (x=1, y=2, z=2)• P(x=1 | B) = 1/4• P(y=2 | B) = 2/4• P(z=2 | B) = 0

• Avoid problem by using M-estimate which pads the computation with additional samples • Conditional probability = (a + mp) / (b + m)• m = 5 (equivalent sample size)• p = 1/num_values_for_category (1/4 for x)• a = training example with category value and

classification (x=1 and B is 1)• b = training examples with classification (B is 4)


Collaborative Filtering

• A method that uses Bayesian reasoning to suggest items that a person might be interested in, based on their known interests.

• If we know that Anne and Bob both like A, B and C, and that Anne likes D then we guess that Bob would also like D.• P(Bob likes Z | Bob likes A, Bob likes B, …, Bob likes Y)

• Can be calculated using decision trees:

B

announcements

Documents