lecture 07 - bayesian learning - 1
Post on 04-Jun-2018
215 Views
Preview:
TRANSCRIPT
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
1/25
1
Bayesian Classif iers
Bayesian classifiers are statistical classifiers, and are based
on Bayes theorem
They can calculate the probability that a given samplebelongs to a particular class
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
2/25
2
Bayesian learning algorithms are among the most
practical approaches to certain types of learning
problems.
Their results are comparable to the performance of other
classifiers, such as decision tree and neural networks in
many cases
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
3/25
3
Bayes Theorem
Let Xbe a data sample, e.g. red and round fruit
Let H be some hypothesis, such as that X belongs to a
specified class C(e.g. X is an apple)
For classification problems, we want to determine P(H|X),
the probability that the hypothesis H holds given the
observed data sample X
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
4/25
4
Prior & Poster ior Probabil i ty
The probability P(H) is called the prior probability of H, i.e
the probability that any given data sample is an apple,
regardless of how the data sample looks
The probability P(H|X) is called posterior probability. It is
based on more information, then the prior probability P(H)
which is independent of X
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
5/25
5
Bayes Theorem
It provides a way of calculating the posterior probability
P(H|X)= P(X|H ) P(H)
P(X)
P(X|H) is the posterior probability of X given H (it is the
probability that Xis red and round given that Xis an apple)
P(X) is the prior probability of X (probability that a data
sample is red and round)
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
6/25
6
Bayes Theorem: Proof
The posterior probability of the fruit being an apple given
that its shape is round and its colour is red is
P(H|X)= |HX| / |X|i.e. the number of apples which are red and round divided bythe total number of red and round fruits
Since P(HX) = |HX| / |total fruits of all size and shapes|and P(X) = |X| / |total fruits of all size and shapes|
Hence P(H|X)= P(HX) / P(X)
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
7/25
7
Bayes Theorem: Proof
Similarly P(X|H)= P(HX) / P(H)Since we have P(HX) = P(H|X)P(X)And also P(HX) = P(X|H)P(H)Therefore P(H|X)P(X)= P(X|H)P(H)
And hence P(H|X)= P(X|H ) P(H) / P(X)
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
8/25
8
Nave (Simple) Bayesian Classi f ication
Studies comparing classification algorithms have found that
the simple Bayesian classifier is comparable in performance
with decision tree and neural network classifiers
It works as follows:
1. Each data sample is represented by an n-dimensional
feature vector, X = (x1, x2, , xn), depicting nmeasurements made on the sample from n attributes,
respectively A1, A2, An
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
9/25
9
Nave (Simple) Bayesian Classi f ication
2. Suppose that there are m classes C1, C2, Cm. Given an
unknown data sample, X (i.e. having no class label), the
classifier will predict that X belongs to the class having the
highest posterior probability given X
Thus if P(Ci|X) > P(Cj|X) for 1 j m , j i
then X is assigned to Ci
This is called Bayes decision rule
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
10/2510
Nave (Simple) Bayesian Classi f ication
3. We have P(Ci|X)= P(X|Ci) P(Ci) / P(X)
As P(X) is constant for all classes, only P(X|Ci) P(Ci) needs to
be calculated
The class prior probabilities may be estimated by
P(Ci) = si / s
where siis the number of training samples of class Ci& s is the total number of training samples
If class prior probabilities are equal (or not known and thus
assumed to be equal) then we need to calculate only P(X|Ci)
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
11/2511
Nave (Simple) Bayesian Classi f ication
4. Given data sets with many attributes, it would be
extremely computationally expensive to compute P(X|Ci)
For example, assuming the attributes of colour and shape tobe Boolean, we need to store 4 probabilities for the category
apple
P(red round | apple)P(red
round | apple)P(red round | apple)
P(red round | apple)If there are 6 attributes and they are Boolean, then we need
to store 26probabilities
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
12/2512
Nave (Simple) Bayesian Classi f ication
In order to reduce computation, the nave assumption of
class conditional independenceis made
This presumes that the values of the attributes areconditionally independent of one another, given the class
label of the sample (we assume that there are no dependence
relationships among the attributes)
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
13/2513
Nave (Simple) Bayesian Classi f ication
Thus we assume that P(X|Ci) = nk=1P(xk|Ci)Example
P(colour shape | apple) = P(colour | apple) P(shape | apple)For 6 Boolean attributes, we would have only 12 probabilities
to store instead of 26= 64
Similarly for 6, three valued attributes, we would have 18probabilities to store instead of 36
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
14/2514
Nave (Simple) Bayesian Classi f ication
The probabilities P(x1|Ci), P(x2|Ci), , P(xn|Ci) can be
estimated from the training samples, where
For an attribute Ak, which can take on the values x1k, x2k,
e.g. colour = red, green,
P(xk|Ci) = sik/si
where sik
is the number of training samples of class Cihaving
the value xkfor Ak
and siis the number of training samples belonging to Ci
e.g. P(red|apple) = 7/10 if 7 out of 10 apples are red
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
15/2515
Nave (Simple) Bayesian Classi f ication
Example:
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
16/2516
Nave (Simple) Bayesian Classi f ication
Example:
Let C1= class buy computer and C2= class not buy computer
The unknown sample:
X = {age = 30, income = medium, student = yes, credit-
rating = fair}
The prior probability of each class can be computed as
P(buy computer = yes) = 9/14 = 0.643
P(buy_computer = no) = 5/14 = 0.357
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
17/2517
Nave (Simple) Bayesian Classi f ication
Example:
To compute P(X|Ci) we compute the following conditional
probabilities
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
18/2518
Nave (Simple) Bayesian Classi f ication
Example:
Using the above probabilities we obtain
And hence the nave Bayesian classifier predicts that the
student will buy computer, because
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
19/2519
Nave (Simple) Bayesian Classi f ication
An Example: Learning to classify text
-Instances (training samples) are text documents
-Classification labels can be: like-dislike, etc.-The task is to learn from these training examples to
predict the class of unseen documents
Design issue:
- How to represent a text document in terms of
attribute values
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
20/2520
Nave (Simple) Bayesian Classi f ication
One approach:
- The attributes are the word positions
- Value of an attribute is the word found in that
position
Note that the number of attributes may be different for each
document
We calculate the prior probabilities of classes from the
training samples
Also the probabilities of word in a position is calculated
e.g. P(Thein first position | like document)
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
21/2521
Nave (Simple) Bayesian Classi f ication
Second approach:
The frequency with which a word occurs is counted
irrespective of the wordsposition
Note that here also the number of attributes may be different
for each document
The probabilities of words are
e.g. P(The| like document)
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
22/2522
Nave (Simple) Bayesian Classi f ication
Results
An algorithm based on the second approach was applied to
the problem of classifying articles of news groups
-20 newsgroups were considered
- 1,000 articles of each news group were collected (total
20,000 articles)
-The nave Bayes algorithm was applied using 2/3rd of
these articles as training samples
-Testing was done over the remaining 3rd
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
23/2523
Nave (Simple) Bayesian Classi f ication
Results
-Given 20 news groups, we would expect random
guessing to achieve a classification accuracy of 5%-The accuracy achieved by this program was 89%
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
24/2524
Nave (Simple) Bayesian Classi f ication
Minor Variant
The algorithm used only a subset of the words used in the
documents
- 100 most frequent words were removed (these include
words such as the,and of)
-Any word occurring fewer than 3 times was also
removed
BAYESIAN LEARNING
-
8/13/2019 Lecture 07 - Bayesian Learning - 1
25/25
Chapter 6 of T. Mitchell
Reference
BAYESIAN LEARNING
top related