text classification using naive bayes

Post on 10-Apr-2018

244 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 1/26

Bayesian Classifiers Part 2

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 2/26

Contents

Simple Text Classification Using Naïve Bayes

Bayesian Belief Networks (Bayes Nets)

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 3/26

SIMPLE TEXT CLASSIFICATIONUSING NAÏVE BAYES

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 4/26

Learning to Classify Text

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 5/26

Learning to Classify Text

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 6/26

Learn_Naïve_Bayes_Text (Examples, V )

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 7/26

Classify_Naïve_Bayes_Text (Doc)

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 8/26

Twenty Newsgroups (Joachims, 1996)

1000 training documents from each of 20 groups 20,000

Use two third of them in learning to classify new documentsaccording to which newsgroup it came from.

Newsgroups:y comp.graphics, misc.forsale, comp.os.ms-windows.misc,

rec.autos, comp.sys.ibm.pc.hardware, rec.motorcycles,comp.sys.mac.hardware, rec.sport.baseball, comp.windows.x,rec.sport.hockey, alt.atheism, sci.space, soc.religion.christian,sci.crypt, talk.religion.misc, sci.electronics, talk.politics.mideast,sci.med, talk.politics.misc, talk.politics.guns

Naive Bayes: 89% classification accuracy

Random guess: ?

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 9/26

An article from rec.sport.hockey

Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!ogicse!uwm.edu

From: xxx@yyy.zzz.edu (John Doe)

Subject: Re: This year's biggest and worst (opinion)...

Date: 5 Apr 93 09:53:39 GMT

I can only comment on the Kings, but the mostobvious candidate for pleasant surprise is Alex

Zhitnik. He came highly touted as a defensive

defenseman, but he's clearly much more than that.

Great skater and hard shot (though wish he were

more accurate). In fact, he pretty much allowed

the Kings to trade away that huge defensiveliability Paul Coffey. Kelly Hrudey is only the

biggest disappointment if you thought he was any

good to begin with. But, at best, he's only a

mediocre goaltender. A better choice would be

Tomas Sandstrom, though not through any fault of 

his own, but because some thugs in Toronto decided

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 10/26

Learning Curve for 20 Newsgroups

Accuracy vs. Training set size (1/3 withheld for test)

(Note that the x-axis in log scale)

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 11/26

Problems In Classifying Text

Frequent words e.g. the, of

Words with insignificant occurrence e.g.

less than threeRemove them from Vocabulary!

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 12/26

BAYESIAN BELIEF NETWORKS(BAYES NETS)

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 13/26

Overview

Bayesian Belief Network

Learning Bayesian Network

 ± Data is fully observable and network structure is

known Conditional probabilities table from training data (Naïve Bayesian

classifier)

 ± Network structure is known, but data is partiallyobservable

Conditional probabilities table can be obtained in similar manner forobtaining neural network weights

Other technique is by using EM algorithm

 ± Data is partially observable and networkstructure is unknown?

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 14/26

Bayesian Belief Networks

Interesting because: ± Naive Bayes assumption of conditional

independence too restrictive

 ± But it's intractable without some suchassumptions...

 ± Bayesian Belief networks describe conditionalindependence among of variables

 ± allows combining prior knowledge about

(in)dependencies among variables withobserved training data

Also called Bayes Nets

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 15/26

Conditional Independence

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 16/26

Bayesian Belief Network

Network represents a set of conditional independenceassertions:y Each node is asserted to be conditionally independent of its

nondescendants, given its immediate predecessors.

y Directed acyclic graph

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 17/26

Bayesian Belief Network

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 18/26

Inference in Bayesian Networks

How can one infer the values of one or more networkvariables, given observed values of others?y Bayes net contains all information needed for this

inference

y

If only one variable with unknown value, easy to infer ity In general case, problem is NP hard

In practice, one can succeed in many casesy Exact inference methods work well for some network

structuresy Monte Carlo methods simulate the network randomly to

calculate approximate solutions

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 19/26

Learning of Bayesian Networks

Several variants of this learning tasky Network structure might be know n or unknow n

y Training examples might provide values of all networkvariables, or just some

If structure known and observe all variablesy Then it's easy as training a Naive Bayes classifier

Suppose structure known, variables partially

observabley e.g., observe F or est Fi r e, St or m, BusT ourGrou p, Thund er ,

but not Light ni ng , Campf i r e Similar to training neural network with hidden units

In fact, one can learn network conditional probability tables using gradient ascent!

Converge to network h that (locally) maximizes P( D|h)

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 20/26

Learning of Bayesian Networks?

Maximization of P(D|h)

In principle, it is easyy

Calculate P(D|h) for each h and return h of maximunP(D|h)

In practice, h contains many, many continuousvariablesy

Use gradient descent (ascent) method

In general, h contains discrete variables, too. (?)y Use an algorithm for combinatorial optimization,

such as simulated annealing method

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 21/26

Gradient Ascent for Bayes Nets

i  j

k

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 22/26

Gradient Ascent for Bayes Nets

' ' ' '

', '

' ' ' ' '

', '

ln lnln

1

1| , ,

1| , |

1| , |

h h

h

d Dd Dijk ijk ijk  

h

d D h ijk  

h ij ik h ij ik  

d D j k  h ijk  

h ij ik h ij ik h ik  

d D j k  h ijk  

h ij ik h ij ik h ik  

d D h ijk  

  P D P d   P d 

w w w

 P d 

  P d w

 P d y u P y u  P d w

 P d y u P y u P u  P d w

 P d y u P y u P u  P d w

x xx! !

x x x

x!

x

x!

x

x!

x

x!

x

§

§

§ §

§ §

§

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 23/26

Gradient Ascent for Bayes Nets

ln 1| , |

1| ,

1| ,

, |1

,

, | , |

, |

h

h ij ik h ij ik h ik  

d Dijk h ijk  

h ij ik ijk h ik  

d D h ijk  

h ij ik h ik  

d D h

h ij ik h h ik  

d D h h ij ik  

h ij ik h ik h ij ik  

d D d h ij ik h ij ik  

 P D P d y u P y u P u

w P d w

 P d y u w P u

  P d w

 P d y u P u P d 

 P y u d P d P u

 P d  P y u

 P y u d P u P y u d  

  P y u P y u

x x!

x x

x!

x

!

!

! !

§

§

§

§

§ ...

 D

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 24/26

Gradient Ascent for Bayes Nets

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 25/26

More on Learning Bayes Nets

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 26/26

Summary: Bayesian Belief 

Networks Combine prior knowledge with observed data ± Q: how does prior knowledge enter the network?

Impact of prior knowledge (when correct!) is to lower

the sample complexity

Active research area ± Extend from boolean to real-valued variables

 ± Parameterized distributions instead of tables

 ± Extend to first-order instead of propositional systems

 ± More effective inference methods

 ± ...

top related