improving text classification by shrinkage in a hierarchy of classes andrew mccallum just research...

18
Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y. Ng MIT AI Lab

Upload: osborn-quinn

Post on 05-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

Improving Text Classification by Shrinkage in a

Hierarchy of Classes

Andrew McCallum

Just Research & CMU

Tom Mitchell

CMU

Roni Rosenfeld

CMU

Andrew Y. Ng

MIT AI Lab

Page 2: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

2

The Task: Document Classification(also “Document Categorization”, “Routing” or “Tagging”)

Automatically placing documents in their correct categories.

Magnetism RelativityEvolutionBotanyIrrigation Crops

cornwheatsilofarmgrow...

corntulipssplicinggrow...

watergratingditchfarmtractor...

selectionmutationDarwinGalapagosDNA...

... ...

“grow corn tractor…”

TrainingData:

TestingData:

Categories:

(Crops)

Page 3: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

3

The Idea: “Shrinkage” / “Deleted Interpolation”

We can improve the parameter estimates in a leaf by averaging them with the estimates in its ancestors.

Magnetism Relativity

Physics

EvolutionBotanyIrrigation Crops

BiologyAgriculture

Science

cornwheatsilofarmgrow...

corntulipssplicinggrow...

watergratingditchfarmtractor...

“corn grow tractor…”

selectionmutationDarwinGalapagosDNA...

... ...

TestingData:

TrainingData:

Categories:

(Crops)

Page 4: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

4

A Probabilistic Approach toDocument Classification

||

1

),(||

),(1

)|r(P̂V

t cdkt

cdki

ji

jk

jk

dwNV

dwN

cw

||

1

)|Pr()Pr(maxargd

ijdjj cwcc

i

Maximum a posteriori estimate of Pr(w|c),with a Dirichlet prior, =1(AKA Laplace smoothing)

Naïve Bayes

where N(w,d) isnumber of times word w occursin document d.

where cj is a class, d is a document, wdi is the i th word of document d

Page 5: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

5

“Shrinkage” / “Deleted Interpolation”

Crops of

ancestors#

0ancestor

ancestorancestorSHRINKAGE )Crops|tractor""r(P̂)Crops|tractor""(Pr j

[James and Stein, 1961] / [Jelinek and Mercer, 1980]

)Crops|tractor"r("P̂

||

1)tractor"("PrUNIFORM V

(Uniform)

Magnetism Relativity

Physics

EvolutionBotanyIrrigation Crops

BiologyAgriculture

Science

)eAgricultur|tractor"r("P̂

)Science|tractor"r("P̂

Page 6: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

6

Learning Mixture Weights

Crops

Agriculture

Science

Learn the ’s via EM, performing the E-step with leave-one-out cross-validation.

parent

Crops

child

Crops

tgrandparen

Crops

Uniform uniform

Crops

corn wheatsilo farmgrow...

Use the current ’s to estimate the degreeto which each node was likely to have generated the words in held out documents.

E-step

M-stepUse the estimates to recalculate new

values for the ’s.

Page 7: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

7

Learning Mixture Weights

Hw jtj

jtjj

tcw

cw

m

mm

aaa

)|r(P̂

)|r(P̂

m

m

aa

j

jj

E-step

M-step

Page 8: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

8

Newsgroups Data Set

macibm

graphicswindows

X guns

mideastautomotorcycle

atheism

christian

misc baseballhockey

misc

computers religion sport politics motor

15 classes, 15k documents,1.7 million words, 52k vocabulary

(Subset of Ken Lang’s 20 Newsgroups set)

Page 9: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

9

Newsgroups HierarchyMixture Weights

Mixture Weights# trainingdocuments Class child parent g’parent uniform

/politics/talk.politics.guns 0.368 0.092 0.017 0.522/politics/talk.politics.mideast 0.256 0.132 0.001 0.611235/politics/talk.politics.misc 0.197 0.213 0.026 0.564/politics/talk.politics.guns 0.801 0.089 0.048 0.061/politics/talk.politics.mideast 0.859 0.061 0.010 0.0717497/politics/talk.politics.misc 0.762 0.126 0.043 0.068

Page 10: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

10

Newsgroups HierarchyMixture Weights

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

guns mideast misc guns mideast misc

leaf

parent

root

uniform

235 training documents(15/class)

7497 training documents(~500/class)

Page 11: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

11

Industry Sector Data Set

waterair

railroadtrucking

misc coal

oil&gas

filmcommunication

electric

water

gas appliancefurniture

integrated

transportation utilities consumer energy services

71 classes, 6.5k documents,1.2 million words, 30k vocabulary

... ... ...

… (11)

www.marketguide.com

Page 12: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

12

Industry Sector Classification Accuracy

Title:

Creator:gnuplotPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Page 13: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

13

Newsgroups Classification Accuracy

Title:

Creator:gnuplotPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Page 14: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

14

Yahoo Science Data Set

dairycrops

agronomyforestry

AI

HCIcraft

missions

botany

evolution

cellmagnetism

relativity

courses

agriculture biology physics CS space

264 classes, 14k documents,3 million words, 76k vocabulary

... ... ...

… (30)

www.yahoo.com/Science

... ...

Page 15: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

15

Yahoo Science Classification Accuracy

Title:

Creator:gnuplotPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Page 16: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

17

Related Work• Shrinkage in Statistics:

– [Stein 1955], [James & Stein 1961]

• Deleted Interpolation in Language Modeling:– [Jelinek & Mercer 1980], [Seymore & Rosenfeld 1997]

• Bayesian Hierarchical Modeling for n-grams– [MacKay & Peto 1994]

• Class hierarchies for text classification– [Koller & Sahami 1997]

• Using EM to set mixture weights in a hierarchical clustering model for unsupervised learning– [Hofmann & Puzicha 1998]

Page 17: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

18

Conclusions

• Shrinkage in a hierarchy of classes can dramatically improve classification accuracy (29%)

• Shrinkage helps especially when training data is sparse. In models more complex than naïve Bayes, it should be even more helpful.

• [The hierarchy can be pruned for exponential reduction in computation necessary for classification; only minimal loss of accuracy.]

Page 18: Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y

19

Future Work

• Learning hierarchies that aid classification.

• Using more complex generative models.– Capturing word dependancies– Clustering words in each ancestor