belief networks & bayesian classification

AN INTRODUCTION TO BAYESIAN BELIEF NETWORKS AND NAÏVE BAYESIAN

CLASSIFICATION

ADNAN MASOODSCIS.NOVA.EDU/~ADNAN

[email protected]

Belief Networks & Bayesian Classification

mailto:[email protected]

Overview

Probability and Uncertainty Probability Notation Bayesian Statistics Notation of Probability Axioms of Probability Probability Table Bayesian Belief Network Joint Probability Table Probability of Disjunctions Conditional Probability Conditional Independence Bayes' Rule Classification with Bayes rule Bayesian Classification Conclusion & Further Reading

Probability and Uncertainty

Probability provide a way of summarizing the uncertainty. 60% chance of rain today 85% chance of alarm in case of a burglary

Probability is calculated based upon past performance, or degree of belief.

Bayesian Statistics

Three approaches to Probability Axiomatic

Probability by definition and properties Relative Frequency

Repeated trials Degree of belief (subjective)

Personal measure of uncertaintyExamples

The chance that a meteor strikes earth is 1% The probability of rain today is 30% The chance of getting an A on the exam is 50%

Notation of Probability

Random Variables (RV): are capitalized (usually) e.g. Sky, RoadCurvature, Temperature Refer to attributes of the world whose “status” is unknown have one and only value at a time Have a domain of values that are possible states of the world:

Boolean: domain abbreviated as abbreviated as

Discrete: domain is countable (includes Boolean)

Values are exhaustive and mutually exclusive e.g. Sky domain = abbreviated as also abbrv. as

Continuous: domain is real numbers

Notation of Probability

Uncertainty is represented by: or simply this is: degree of belief that variable takes on value given no

other information relating to a single probability called an unconditional or prior

probability

Properties of

Sum over all values in the domain of variable X is 1 because domain is exhaustive and mutually exclusive.

Axioms of Probability

S – Sample Space (set of possible outcomes)E – Some Event (some subset of outcomes)Axioms

For any sequence of mutually exclusive events, , , … …

Probability Table

P(Weather= sunny)=P(sunny)=5/13P(Weather)={5/14, 4/14, 5/14}Calculate probabilities from data

sunny overcast

rainy

5/14 4/14 5/14

Outlook

An expert built belief network using weather dataset(Mitchell; Witten & Frank)

Bayesian inference can help answer questions like probability of game play if

a. Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong

b. Outlook=overcast, Temperature=cool, Humidity=high, Wind=strong

Bayesian Belief Network

Bayesian belief network allows a subset of the variables conditionally independent

A graphical model of causal relationshipsSeveral cases of learning Bayesian belief

networks• Given both network structure and all the variables:

easy• Given network structure but only some variables• When the network structure is not known in

advance


FamilyHistory

Smoker

Lung Cancer Emphysema

Positive X Ray Dyspnea

LC 0.8 0.5 0.7 0.1

~LC 0.2 0.5 0.3 0.9

(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)


The conditional probability tablefor the variable Lung Cancer

A Hypothesis for playing tennis

Joint Probability Table

P(Outlook= sunny, Temperature=hot)=P(sunny, hot)=2/14

P(Temperature=hot)=P(hot)=2/14+2/14+0/14=4/14

With N random variables that can take k values the full joint probability table size is

2/14 2/14 0/14

2/14 1/14 3/14

1/14 1/14 2/14

Outlook

Sunny overcast rainyHot

mild

cool

Temperature

Example: Calculating Global Probabilistic Beliefs

P(PlayTennis) = 9/14 = 0.64P(~PlayTennis) = 5/14 = 0.36P(Outlook=sunny|PlayTennis) = 2/9 = 0.22P(Outlook=sunny|~PlayTennis) = 3/5 = 0.60P(Temperature=cool|PlayTennis) = 3/9 =

0.33P(Temperature=cool|~PlayTennis) = 1/5

= .20P(Humidity=high|PlayTennis) = 3/9 = 0.33P(Humidity=high|~PlayTennis) = 4/5 = 0.80P(Wind=strong|PlayTennis) = 3/9 = 0.33P(Wind=strong|~PlayTennis) = 3/5 = 0.60

Probability of Disjunctions

Conditional Probability

Probabilities discussed so far are called prior probabilities or unconditional probabilities Probabilities depend only on the data, not on any other

variableBut what if you have some evidence or knowledge

about the situation? You know have a toothache. Now what is the probability of having a cavity?


Calculate conditional probabilities from data as follows:

What is


You can think of P(B) as just a normalization constant to make P(A|B) adds up to 1.

Product rule: P(A, B)=P(A|B)P(B)=P(B|A)P(A)Chain Rules is successive applications of product rule:

¿

Conditional Independence

What if I know Weather=cloudy today. Now what is the P(cavity)?

If knowing some evidence doesn’t change the probability of some other random variable then we say the two random variables are independent

A and B are independent if P(A|B)=P(A).Other ways of seeing this (all are equivalent):

Absolute Independence is powerful but rare!

The independence hypothesis…

… makes computation possible… yields optimal classifiers when satisfied… but is seldom satisfied in practice, as

attributes (variables) are often correlated.Attempts to overcome this limitation:• Bayesian networks, that combine Bayesian

reasoning with causal relationships between attributes

• Decision trees, that reason on one attribute at the time, considering most important attributes first

Conditional Independence

P(Toothache, Cavity, Catch) has independent entries If I have a cavity, the probability that the probe catches in it

doesn’t depend on whether I have a toothache:

The same independent holds if I haven’t got a cavity:(

Catch is conditionally independent of toothache given cavity:

Equivalent statements:

Bayes’ Rule

A more general form is:

Bayes’ rule allows you to conditional probabilities on their head: Useful for assessing diagnostics probability from causal probability

E.g., let M be meningitis, S be stiff neck:=0.80.0001/0.1=0.0008

Note posterior probability of meningitis still very small!

Classification with Bayes Rule

Naïve Bayes Model

Classify with highest probability One of the most widely used classifiers Very Fast to train and to classify

• One pass over all data to train• One lookup for each features/ class combination to classify

Assuming the features are independent given the class (conditional independence)

Naïve Bayes Classifier

Simplified assumption: features are conditionally independent:

Bayesian Classification: Why?

Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems

Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.

Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities

Benchmark: Even if Bayesian methods are computationally intractable, they can provide a benchmark for other algorithms

Classification with Bayes Rule

Courtesy, Simafore - http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics

http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics

http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics

Issues with naïve Bayes

Change in Classifier Data (on the fly, during classification)

Conditional independence assumption is violated Consider the task of classifying whether or not a certain

word is corporation name E.g. “Google,” “Microsoft,”” “IBM,” and “ACME”

Two useful features we might want to use are capitalized, and all-capitals

Native Bayes will assume that these two features are independent given the class, but this clearly isn’t the case (things that are all-caps must also be capitalized )!!

However naïve Bayes seems to work well in practice even when this assumption is violated

Naïve Bayes Classifier

Naive Bayesian Classifier

Given a training set, we can compute the probabilities

Outlook P N

Sunny 2/9 3/5

Overcast 4/9 0

rain 3/9 2/5

Temperature

Hot 2/9 2/5

Mild 4/9 2/5

cool 3/9 1/5

Humidity

P N

High 3/9 4/5

normal 6/9 1/5

Windy

true 3/9 3/5

false 6/9 2/5

Bayesian classification

The classification problem may be formalized using a-posteriori probabilities :

probability that the sample tuple ,……., is of class C

E.g., P(class=N | outlook=sunny, windy=true,…)

Idea: assign sample X class label C such that is maximal

Estimating a-posteriori probabilities

Bayes theorem: /

is constant for all classes relative frequency of class C samplesC such that is maximum =C such that is maximumProblem: computing P(X|C) is infeasible!

Naïve Bayesian Classification

Naïve assumption: attribute independence

If i-th attribute is categorical: is estimated as the relative freq of samples having value as i-th attribute in class C

If i-th attribute is continuous: is estimated thru a Gaussian density function

Computationally easy in both cases

Play Tennis example

An unseen sample X = <rain, hot, high, false>

=

=Sample X is classified in class n (don’t play)

Conclusion & Future Reading

ProbabilitiesJoint ProbabilitiesConditional ProbabilitiesIndependence, Conditional IndependenceNaïve Bayes Classifier

References

J. Han, M. Kamber; Data Mining; Morgan Kaufmann Publishers: San Francisco, CA.

Bayesian Networks without Tears. | Charniak | AI Magazine http://www.aaai.org/ojs/index.php/aimagazine/article/view/918

Bayesian networks - Automated Reasoning Group – UCLA – Adnan Darwiche

http://www.aaai.org/ojs/index.php/aimagazine/article/view/918





https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDMQFjAB&url=http%3A%2F%2Freasoning.cs.ucla.edu%2Ffetch.php%3Fid%3D104%26type%3Dpdf&ei=cMG2UfDWOaTtigLxyYGQDw&usg=AFQjCNFCkoWddhwjlprbxkEfexx30BDOYg&sig2=lPQzUnJtZayU-VwNQP5U-Q&bvm=bv.47534661,d.cGE

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDMQFjAB&url=http%3A%2F%2Freasoning.cs.ucla.edu%2Ffetch.php%3Fid%3D104%26type%3Dpdf&ei=cMG2UfDWOaTtigLxyYGQDw&usg=AFQjCNFCkoWddhwjlprbxkEfexx30BDOYg&sig2=lPQzUnJtZayU-VwNQP5U-Q&bvm=bv.47534661,d.cGE

belief networks & bayesian classification

Technology

probability of rain

overview probability

burglary probability

probability table pweather

joint probability table214

bayesian networks

t i o na d n

s o o ds c i s