lecture 07 - bayesian learning - 1

Upload: decentsoldiers007

Post on 04-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    1/25

    1

    Bayesian Classif iers

    Bayesian classifiers are statistical classifiers, and are based

    on Bayes theorem

    They can calculate the probability that a given samplebelongs to a particular class

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    2/25

    2

    Bayesian learning algorithms are among the most

    practical approaches to certain types of learning

    problems.

    Their results are comparable to the performance of other

    classifiers, such as decision tree and neural networks in

    many cases

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    3/25

    3

    Bayes Theorem

    Let Xbe a data sample, e.g. red and round fruit

    Let H be some hypothesis, such as that X belongs to a

    specified class C(e.g. X is an apple)

    For classification problems, we want to determine P(H|X),

    the probability that the hypothesis H holds given the

    observed data sample X

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    4/25

    4

    Prior & Poster ior Probabil i ty

    The probability P(H) is called the prior probability of H, i.e

    the probability that any given data sample is an apple,

    regardless of how the data sample looks

    The probability P(H|X) is called posterior probability. It is

    based on more information, then the prior probability P(H)

    which is independent of X

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    5/25

    5

    Bayes Theorem

    It provides a way of calculating the posterior probability

    P(H|X)= P(X|H ) P(H)

    P(X)

    P(X|H) is the posterior probability of X given H (it is the

    probability that Xis red and round given that Xis an apple)

    P(X) is the prior probability of X (probability that a data

    sample is red and round)

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    6/25

    6

    Bayes Theorem: Proof

    The posterior probability of the fruit being an apple given

    that its shape is round and its colour is red is

    P(H|X)= |HX| / |X|i.e. the number of apples which are red and round divided bythe total number of red and round fruits

    Since P(HX) = |HX| / |total fruits of all size and shapes|and P(X) = |X| / |total fruits of all size and shapes|

    Hence P(H|X)= P(HX) / P(X)

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    7/25

    7

    Bayes Theorem: Proof

    Similarly P(X|H)= P(HX) / P(H)Since we have P(HX) = P(H|X)P(X)And also P(HX) = P(X|H)P(H)Therefore P(H|X)P(X)= P(X|H)P(H)

    And hence P(H|X)= P(X|H ) P(H) / P(X)

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    8/25

    8

    Nave (Simple) Bayesian Classi f ication

    Studies comparing classification algorithms have found that

    the simple Bayesian classifier is comparable in performance

    with decision tree and neural network classifiers

    It works as follows:

    1. Each data sample is represented by an n-dimensional

    feature vector, X = (x1, x2, , xn), depicting nmeasurements made on the sample from n attributes,

    respectively A1, A2, An

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    9/25

    9

    Nave (Simple) Bayesian Classi f ication

    2. Suppose that there are m classes C1, C2, Cm. Given an

    unknown data sample, X (i.e. having no class label), the

    classifier will predict that X belongs to the class having the

    highest posterior probability given X

    Thus if P(Ci|X) > P(Cj|X) for 1 j m , j i

    then X is assigned to Ci

    This is called Bayes decision rule

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    10/2510

    Nave (Simple) Bayesian Classi f ication

    3. We have P(Ci|X)= P(X|Ci) P(Ci) / P(X)

    As P(X) is constant for all classes, only P(X|Ci) P(Ci) needs to

    be calculated

    The class prior probabilities may be estimated by

    P(Ci) = si / s

    where siis the number of training samples of class Ci& s is the total number of training samples

    If class prior probabilities are equal (or not known and thus

    assumed to be equal) then we need to calculate only P(X|Ci)

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    11/2511

    Nave (Simple) Bayesian Classi f ication

    4. Given data sets with many attributes, it would be

    extremely computationally expensive to compute P(X|Ci)

    For example, assuming the attributes of colour and shape tobe Boolean, we need to store 4 probabilities for the category

    apple

    P(red round | apple)P(red

    round | apple)P(red round | apple)

    P(red round | apple)If there are 6 attributes and they are Boolean, then we need

    to store 26probabilities

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    12/2512

    Nave (Simple) Bayesian Classi f ication

    In order to reduce computation, the nave assumption of

    class conditional independenceis made

    This presumes that the values of the attributes areconditionally independent of one another, given the class

    label of the sample (we assume that there are no dependence

    relationships among the attributes)

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    13/2513

    Nave (Simple) Bayesian Classi f ication

    Thus we assume that P(X|Ci) = nk=1P(xk|Ci)Example

    P(colour shape | apple) = P(colour | apple) P(shape | apple)For 6 Boolean attributes, we would have only 12 probabilities

    to store instead of 26= 64

    Similarly for 6, three valued attributes, we would have 18probabilities to store instead of 36

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    14/2514

    Nave (Simple) Bayesian Classi f ication

    The probabilities P(x1|Ci), P(x2|Ci), , P(xn|Ci) can be

    estimated from the training samples, where

    For an attribute Ak, which can take on the values x1k, x2k,

    e.g. colour = red, green,

    P(xk|Ci) = sik/si

    where sik

    is the number of training samples of class Cihaving

    the value xkfor Ak

    and siis the number of training samples belonging to Ci

    e.g. P(red|apple) = 7/10 if 7 out of 10 apples are red

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    15/2515

    Nave (Simple) Bayesian Classi f ication

    Example:

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    16/2516

    Nave (Simple) Bayesian Classi f ication

    Example:

    Let C1= class buy computer and C2= class not buy computer

    The unknown sample:

    X = {age = 30, income = medium, student = yes, credit-

    rating = fair}

    The prior probability of each class can be computed as

    P(buy computer = yes) = 9/14 = 0.643

    P(buy_computer = no) = 5/14 = 0.357

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    17/2517

    Nave (Simple) Bayesian Classi f ication

    Example:

    To compute P(X|Ci) we compute the following conditional

    probabilities

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    18/2518

    Nave (Simple) Bayesian Classi f ication

    Example:

    Using the above probabilities we obtain

    And hence the nave Bayesian classifier predicts that the

    student will buy computer, because

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    19/2519

    Nave (Simple) Bayesian Classi f ication

    An Example: Learning to classify text

    -Instances (training samples) are text documents

    -Classification labels can be: like-dislike, etc.-The task is to learn from these training examples to

    predict the class of unseen documents

    Design issue:

    - How to represent a text document in terms of

    attribute values

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    20/2520

    Nave (Simple) Bayesian Classi f ication

    One approach:

    - The attributes are the word positions

    - Value of an attribute is the word found in that

    position

    Note that the number of attributes may be different for each

    document

    We calculate the prior probabilities of classes from the

    training samples

    Also the probabilities of word in a position is calculated

    e.g. P(Thein first position | like document)

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    21/2521

    Nave (Simple) Bayesian Classi f ication

    Second approach:

    The frequency with which a word occurs is counted

    irrespective of the wordsposition

    Note that here also the number of attributes may be different

    for each document

    The probabilities of words are

    e.g. P(The| like document)

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    22/2522

    Nave (Simple) Bayesian Classi f ication

    Results

    An algorithm based on the second approach was applied to

    the problem of classifying articles of news groups

    -20 newsgroups were considered

    - 1,000 articles of each news group were collected (total

    20,000 articles)

    -The nave Bayes algorithm was applied using 2/3rd of

    these articles as training samples

    -Testing was done over the remaining 3rd

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    23/2523

    Nave (Simple) Bayesian Classi f ication

    Results

    -Given 20 news groups, we would expect random

    guessing to achieve a classification accuracy of 5%-The accuracy achieved by this program was 89%

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    24/2524

    Nave (Simple) Bayesian Classi f ication

    Minor Variant

    The algorithm used only a subset of the words used in the

    documents

    - 100 most frequent words were removed (these include

    words such as the,and of)

    -Any word occurring fewer than 3 times was also

    removed

    BAYESIAN LEARNING

  • 8/13/2019 Lecture 07 - Bayesian Learning - 1

    25/25

    Chapter 6 of T. Mitchell

    Reference

    BAYESIAN LEARNING