belief networks & bayesian classification
DESCRIPTION
An Introduction to Bayesian Belief Networks and Naïve Bayesian ClassificationTRANSCRIPT
![Page 1: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/1.jpg)
AN INTRODUCTION TO BAYESIAN BELIEF NETWORKS AND NAÏVE BAYESIAN
CLASSIFICATION
ADNAN MASOODSCIS.NOVA.EDU/~ADNAN
Belief Networks & Bayesian Classification
![Page 2: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/2.jpg)
Overview
Probability and Uncertainty Probability Notation Bayesian Statistics Notation of Probability Axioms of Probability Probability Table Bayesian Belief Network Joint Probability Table Probability of Disjunctions Conditional Probability Conditional Independence Bayes' Rule Classification with Bayes rule Bayesian Classification Conclusion & Further Reading
![Page 3: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/3.jpg)
Probability and Uncertainty
Probability provide a way of summarizing the uncertainty. 60% chance of rain today 85% chance of alarm in case of a burglary
Probability is calculated based upon past performance, or degree of belief.
![Page 4: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/4.jpg)
Bayesian Statistics
Three approaches to Probability Axiomatic
Probability by definition and properties Relative Frequency
Repeated trials Degree of belief (subjective)
Personal measure of uncertaintyExamples
The chance that a meteor strikes earth is 1% The probability of rain today is 30% The chance of getting an A on the exam is 50%
![Page 5: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/5.jpg)
Notation of Probability
Random Variables (RV): are capitalized (usually) e.g. Sky, RoadCurvature, Temperature Refer to attributes of the world whose “status” is unknown have one and only value at a time Have a domain of values that are possible states of the world:
Boolean: domain abbreviated as abbreviated as
Discrete: domain is countable (includes Boolean)
Values are exhaustive and mutually exclusive e.g. Sky domain = abbreviated as also abbrv. as
Continuous: domain is real numbers
![Page 6: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/6.jpg)
Notation of Probability
Uncertainty is represented by: or simply this is: degree of belief that variable takes on value given no
other information relating to a single probability called an unconditional or prior
probability
Properties of
Sum over all values in the domain of variable X is 1 because domain is exhaustive and mutually exclusive.
![Page 7: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/7.jpg)
Axioms of Probability
S – Sample Space (set of possible outcomes)E – Some Event (some subset of outcomes)Axioms
For any sequence of mutually exclusive events, , , … …
![Page 8: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/8.jpg)
Probability Table
P(Weather= sunny)=P(sunny)=5/13P(Weather)={5/14, 4/14, 5/14}Calculate probabilities from data
sunny overcast
rainy
5/14 4/14 5/14
Outlook
![Page 9: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/9.jpg)
An expert built belief network using weather dataset(Mitchell; Witten & Frank)
Bayesian inference can help answer questions like probability of game play if
a. Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong
b. Outlook=overcast, Temperature=cool, Humidity=high, Wind=strong
![Page 10: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/10.jpg)
Bayesian Belief Network
Bayesian belief network allows a subset of the variables conditionally independent
A graphical model of causal relationshipsSeveral cases of learning Bayesian belief
networks• Given both network structure and all the variables:
easy• Given network structure but only some variables• When the network structure is not known in
advance
![Page 11: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/11.jpg)
Bayesian Belief Network
FamilyHistory
Smoker
Lung Cancer Emphysema
Positive X Ray Dyspnea
LC 0.8 0.5 0.7 0.1
~LC 0.2 0.5 0.3 0.9
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
Bayesian Belief Network
The conditional probability tablefor the variable Lung Cancer
![Page 12: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/12.jpg)
A Hypothesis for playing tennis
![Page 13: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/13.jpg)
Joint Probability Table
P(Outlook= sunny, Temperature=hot)=P(sunny, hot)=2/14
P(Temperature=hot)=P(hot)=2/14+2/14+0/14=4/14
With N random variables that can take k values the full joint probability table size is
2/14 2/14 0/14
2/14 1/14 3/14
1/14 1/14 2/14
Outlook
Sunny overcast rainyHot
mild
cool
Temperature
![Page 14: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/14.jpg)
Example: Calculating Global Probabilistic Beliefs
P(PlayTennis) = 9/14 = 0.64P(~PlayTennis) = 5/14 = 0.36P(Outlook=sunny|PlayTennis) = 2/9 = 0.22P(Outlook=sunny|~PlayTennis) = 3/5 = 0.60P(Temperature=cool|PlayTennis) = 3/9 =
0.33P(Temperature=cool|~PlayTennis) = 1/5
= .20P(Humidity=high|PlayTennis) = 3/9 = 0.33P(Humidity=high|~PlayTennis) = 4/5 = 0.80P(Wind=strong|PlayTennis) = 3/9 = 0.33P(Wind=strong|~PlayTennis) = 3/5 = 0.60
![Page 15: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/15.jpg)
Probability of Disjunctions
![Page 16: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/16.jpg)
Conditional Probability
Probabilities discussed so far are called prior probabilities or unconditional probabilities Probabilities depend only on the data, not on any other
variableBut what if you have some evidence or knowledge
about the situation? You know have a toothache. Now what is the probability of having a cavity?
![Page 17: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/17.jpg)
Conditional Probability
Calculate conditional probabilities from data as follows:
What is
![Page 18: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/18.jpg)
Conditional Probability
You can think of P(B) as just a normalization constant to make P(A|B) adds up to 1.
Product rule: P(A, B)=P(A|B)P(B)=P(B|A)P(A)Chain Rules is successive applications of product rule:
¿
![Page 19: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/19.jpg)
Conditional Independence
What if I know Weather=cloudy today. Now what is the P(cavity)?
If knowing some evidence doesn’t change the probability of some other random variable then we say the two random variables are independent
A and B are independent if P(A|B)=P(A).Other ways of seeing this (all are equivalent):
Absolute Independence is powerful but rare!
![Page 20: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/20.jpg)
The independence hypothesis…
… makes computation possible… yields optimal classifiers when satisfied… but is seldom satisfied in practice, as
attributes (variables) are often correlated.Attempts to overcome this limitation:• Bayesian networks, that combine Bayesian
reasoning with causal relationships between attributes
• Decision trees, that reason on one attribute at the time, considering most important attributes first
![Page 21: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/21.jpg)
Conditional Independence
P(Toothache, Cavity, Catch) has independent entries If I have a cavity, the probability that the probe catches in it
doesn’t depend on whether I have a toothache:
The same independent holds if I haven’t got a cavity:(
Catch is conditionally independent of toothache given cavity:
Equivalent statements:
![Page 22: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/22.jpg)
Bayes’ Rule
Remember Conditional Probabilities: P(A|B)=P(A,B)/P(B) P(B)P(A|B)=P(A.B) P(B|A)=P(B,A)/P(A) P(A)P(B|A)=P(B,A) P(B,A)=P(A,B) P(B)P(A|B)=P(A)P(B|A)
Bayes’ Rule: P(A|B)=P(B|A)P(A)/P(B)
![Page 23: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/23.jpg)
Bayes’ Rule
A more general form is:
Bayes’ rule allows you to conditional probabilities on their head: Useful for assessing diagnostics probability from causal probability
E.g., let M be meningitis, S be stiff neck:=0.80.0001/0.1=0.0008
Note posterior probability of meningitis still very small!
![Page 24: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/24.jpg)
Classification with Bayes Rule
Naïve Bayes Model
Classify with highest probability One of the most widely used classifiers Very Fast to train and to classify
• One pass over all data to train• One lookup for each features/ class combination to classify
Assuming the features are independent given the class (conditional independence)
![Page 25: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/25.jpg)
Naïve Bayes Classifier
Simplified assumption: features are conditionally independent:
![Page 26: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/26.jpg)
Bayesian Classification: Why?
Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems
Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.
Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities
Benchmark: Even if Bayesian methods are computationally intractable, they can provide a benchmark for other algorithms
![Page 27: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/27.jpg)
Classification with Bayes Rule
Courtesy, Simafore - http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics
![Page 28: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/28.jpg)
Issues with naïve Bayes
Change in Classifier Data (on the fly, during classification)
Conditional independence assumption is violated Consider the task of classifying whether or not a certain
word is corporation name E.g. “Google,” “Microsoft,”” “IBM,” and “ACME”
Two useful features we might want to use are capitalized, and all-capitals
Native Bayes will assume that these two features are independent given the class, but this clearly isn’t the case (things that are all-caps must also be capitalized )!!
However naïve Bayes seems to work well in practice even when this assumption is violated
![Page 29: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/29.jpg)
Naïve Bayes Classifier
![Page 30: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/30.jpg)
Naive Bayesian Classifier
Given a training set, we can compute the probabilities
Outlook P N
Sunny 2/9 3/5
Overcast 4/9 0
rain 3/9 2/5
Temperature
Hot 2/9 2/5
Mild 4/9 2/5
cool 3/9 1/5
Humidity
P N
High 3/9 4/5
normal 6/9 1/5
Windy
true 3/9 3/5
false 6/9 2/5
![Page 31: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/31.jpg)
Bayesian classification
The classification problem may be formalized using a-posteriori probabilities :
probability that the sample tuple ,……., is of class C
E.g., P(class=N | outlook=sunny, windy=true,…)
Idea: assign sample X class label C such that is maximal
![Page 32: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/32.jpg)
Estimating a-posteriori probabilities
Bayes theorem: /
is constant for all classes relative frequency of class C samplesC such that is maximum =C such that is maximumProblem: computing P(X|C) is infeasible!
![Page 33: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/33.jpg)
Naïve Bayesian Classification
Naïve assumption: attribute independence
If i-th attribute is categorical: is estimated as the relative freq of samples having value as i-th attribute in class C
If i-th attribute is continuous: is estimated thru a Gaussian density function
Computationally easy in both cases
![Page 34: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/34.jpg)
Play-tennis example: estimating
P(p) = 9/14
P(n) = 5/14
outlook
P(sunny|p) = 2/9 P(sunny|n) = 3/5
P(overcast|p) =4/9 P(overcast|n) = 0
P(rain|p) = 3/9 P(rain|n) = 2/5
temperature
P(hot|p) = 2/9 P(hot|n) = 2/5
P(mild|p) = 4/9 P(mild|n) = 2/5
P(cool|p) = 3/9 P(cool|n) = 1/5
humidity
P(high|p) = 3/9 P(high|n) = 4/5
P(normal|p) = 6/9 P(normal|n) = 2/5
windy
P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
![Page 35: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/35.jpg)
Play Tennis example
An unseen sample X = <rain, hot, high, false>
=
=Sample X is classified in class n (don’t play)
![Page 36: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/36.jpg)
Conclusion & Future Reading
ProbabilitiesJoint ProbabilitiesConditional ProbabilitiesIndependence, Conditional IndependenceNaïve Bayes Classifier
![Page 37: Belief Networks & Bayesian Classification](https://reader036.vdocument.in/reader036/viewer/2022081414/54bc2e744a79593b688b45e2/html5/thumbnails/37.jpg)
References
J. Han, M. Kamber; Data Mining; Morgan Kaufmann Publishers: San Francisco, CA.
Bayesian Networks without Tears. | Charniak | AI Magazine http://www.aaai.org/ojs/index.php/aimagazine/article/view/918
Bayesian networks - Automated Reasoning Group – UCLA – Adnan Darwiche