latent dirichlet allocation (nicolas loeff)

Post on 06-Apr-2018

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 1/39

Latent DirichletAllocationD. Blei, A. Ng, M. Jordan

Includes some slides adapted from J.Ramos at Rutgers, M.Steyvers and M. Rozen-Zvi at UCI, L. Fei Fei at UIUC.

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 2/39

Overview

What is so special about text?Classification methodsLSIUnigram / Mixture of UnigramProbabilistic LSI (Aspect Model)LDA modelGeometric interpretation

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 3/39

What is so special about text?

No obvious relation between featuresHigh dimensionality, (often larger vocabulary, V, than the number of features!)Importance of speed

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 4/39

The need for dimensionality

reductionRepresentation:

Presenting documents as vectors in thewords space - ‘bag of words’ representationIt is a sparse representation, V>>|D|

A need to define conceptual closeness

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 5/39

Bag of wordsOf all the sensory impressions proceeding to

the brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.For a long time it was thought that the retinalimage was transmitted point by point to visualcenters in the brain; the cerebral cortex wasa movie screen, so to speak, upon which the

image in the eye was projected. Through thediscoveries of Hubel and Wiesel we nowknow that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able todemonstrate that the message about theimage falling on the retina undergoes a step-wise analysis in a system of nerve cellsstored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain,visual, perception,

retinal, cerebral cortex,eye, cell, opticalnerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The CommerceMinistry said the surplus would be created bya predicted 30% jump in exports to $750bn,compared with a 18% rise in imports to$660bn. The figures are likely to further annoy the US, which has long argued thatChina's exports are unfairly helped by adeliberately undervalued yuan. Beijingagrees the surplus is too high, but says theyuan is only one factor. Bank of Chinagovernor Zhou Xiaochuan said the countryalso needed to do more to boost domesticdemand so more goods stayed within thecountry. China increased the value of theyuan against the dollar by 2.1% in July and

permitted it to trade within a narrow band, butthe US wants the yuan to be allowed to tradefreely. However, Beijing has made it clear that it will take its time and tread carefullybefore allowing the yuan to rise further invalue.

China, trade,surplus, commerce,

exports, imports, US,yuan, bank, domestic,foreign, increase,

trade, value

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 6/39

Bag of words

Order of words in document can beignored, only count matters.Probability theory: Exchangability(includes IID) (Aldous, 1985).Exchangable RVs have a representationas mixture distribution (de Finetti, 1990).

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 7/39

What does this have to do with

Vision?ObjectObject Bag of Bag of

‘words’‘words’

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 8/39

TF-IDF Weighing Scheme (Salton

and McGill, 1983)Given corpus D, word w , document d ,calculate w d = f w, d · log (| D|/f w, D )

Many varieties of basic scheme.

Search procedure:Scan each d , compute each w

i, d , return set

D’ that maximizes Σ i w i, d

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 9/39

A Spatial Representation: Latent

Semantic Analysis (Deerwester, 1990)

Document/Term count matrix

1…

16…

0…

SCIENC E …

6190RESEAR CH

2012SOUL

3034LOVE

Doc3 …Doc2Doc1

SVD

High dimensional space,not as high as |V|

SOUL

RESEARCH

LOVE

SCIENCE

• Each word is a single point in semantic space (dimensionality reduction)• Similarity measured by cosine of angle between word vectors

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 10/39

Feature Vector representation

From: Modeling the Internet and the Web: Probabilistic methodsand Algorithms , Pierre Baldi,Paolo Frasconi, Padhraic Smyth

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 11/39

Classification: assigning words to

topicsDifferent models for data:

Discrete Classifier,modeling the boundariesbetween different classes of

the data

Prediction of

Categoricaloutput e.g., SVM

Density Estimator: modeling thedistribution of the data pointsthemselves

GenerativeModels e.g. NB

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 12/39

Generative Models – Latent

semantic structure

Latent Structure

Words

∑= ),()(ww P P

Distribution over words

)()()|()|(

w

ww

P

P P P =

Inferring latent structure

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 13/39

Topic Models

Unsupervised learning of topics (“gist”) of documents:

articles/chaptersconversationsemails.... any verbal context

Topics are useful latent structures to explainsemantic association

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 14/39

Probabilistic Generative Model

Each document is a probability distributionover topics

Each topic is a probability distribution over words

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 15/39

Generative Process

l o a n

TOPIC 1

m o n e y

l o a n

b a n k

m o n e

y

b a n

k

r i v e r

TOPIC 2

r i v e r

r i v e r

s t r e a m

b a n k

b a n k

s t r e a m

b a n k

l o a n

DOCUMENT 2: river 2 stream 2 bank 2 stream 2 bank 2 money 1 loan 1 river 2 stream 2 loan 1 bank 2 river 2 bank 2 bank 1 stream 2

river 2 loan 1 bank 2 stream 2 bank 2 money 1 loan 1 river 2 stream 2

bank 2 stream 2 bank 2 money 1 river 2 stream 2 loan 1 bank 2 river 2bank 2 money 1 bank 1 stream 2 river 2 bank 2 stream 2 bank 2

money 1

DOCUMENT 1: money 1 bank 1 bank 1 loan 1 river 2 stream 2

bank 1 money 1 river 2 bank 1 money 1 bank 1 loan 1 money 1

stream 2 bank 1 money 1 bank 1 bank 1 loan 1 river 2 stream 2 bank 1

money 1 river 2 bank 1 money 1 bank 1 loan 1 bank 1 money 1

stream 2

.3

.8

.2

Mixturecomponents

Mixtureweights

Bayesian approach: use priorsMixture weights ~ Dirichlet( α )Mixture components ~ Dirichlet( β )

.7

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 16/39

Vision: Topic = Object categories

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 17/39

Simple Model: Unigram

Words of document are drawn IID from asingle multinomial distribution:

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 18/39

Unigram Mixture Model

First choose topic z , then generate wordsconditionally independent given topic.

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 19/39

Unigram Mixture Model

First choose topic z , then generate wordsconditionally independent given topic.

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 20/39

Unigram Mixture Model

First choose topic z , then generate wordsconditionally independent given topic.

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 21/39

Probabilistic Latent Semantic

Indexing (Hoffman, 1999)Document d in training set, and word w n areconditionally independent given topic.

Not truly generative (dummy r.v. d ). Number of parameters grows with size of corpus(overfitting).Document may contain several topics.

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 22/39

Vision app.: Sivic et al., 2005

wN

d z

D

“face”

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 23/39

LDA

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 24/39

LDA

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 25/39

LDA

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 26/39

LDA

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 27/39

LDA

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 28/39

Vision app.: Fei Fei Li, 2005

wN

c z

D

π

“beach”

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 29/39

Example: Word density distribution

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 30/39

A geometric interpretation

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 31/39

LDA

Topics sampled repeatedly in eachDocument (like pLSI).

But, number of parameters does not growwith size of corpus.Problem: Inference.

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 32/39

LDA - Inference

Coupling between Dirchlet distribuionsmakes inference intractable.

Blei, 2001: Variational Approximation

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 33/39

LDA - Inference

Other procedures:Monte Carlo Markov Chin (Griffith et al.,

2002)Expectation Propagation (Minka et al., 2002)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 34/39

Experiments

Perplexity: Inverse of geometric mean per-wordlikelihood (monotonically decreasing function of

likelihood):

Idea: Lower Perplexity implies better generalization.

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 35/39

Experiments – Nematode corpus

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 36/39

Experiments – AP corpus

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 37/39

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 38/39

Polysemy

PRINTINGPAPER PRINT

PRINTED

TYPEPROCESS

INK PRESSIMAGE

PRINTER PRINTS

PRINTERSCOPY

COPIESFORM

OFFSETGRAPHICSURFACE

PRODUCEDCHARACTERS

PLAY

PLAYSSTAGE

AUDIENCETHEATER ACTORSDRAMA

SHAKESPEAREACTOR

THEATREPLAYWRIGHT

PERFORMANCEDRAMATICCOSTUMES

COMEDYTRAGEDY

CHARACTERS

SCENESOPERA

PERFORMED

TEAMGAME

BASKETBALLPLAYERSPLAYER

PLAY

PLAYINGSOCCER PLAYED

BALLTEAMS

BASKETFOOTBALL

SCORECOURT

GAMESTRY

COACHGYMSHOT

JUDGETRIAL

COURT

CASEJURY

ACCUSEDGUILTY

DEFENDANTJUSTICE

EVIDENCE

WITNESSESCRIME

LAWYER WITNESS

ATTORNEYHEARING

INNOCENTDEFENSECHARGE

CRIMINAL

HYPOTHESISEXPERIMENT

SCIENTIFICOBSERVATIONS

SCIENTISTSEXPERIMENTS

SCIENTISTEXPERIMENTAL

TEST

METHODHYPOTHESES

TESTEDEVIDENCE

BASEDOBSERVATION

SCIENCEFACTSDATA

RESULTSEXPLANATION

STUDYTEST

STUDYINGHOMEWORK

NEEDCLASSMATH

TRYTEACHER

WRITEPLAN

ARITHMETICASSIGNMENT

PLACESTUDIED

CAREFULLYDECIDE

IMPORTANT NOTEBOOK

REVIEW

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 39/39

Choosing number of topicsSubjective interpretability

Bayesian model selectionGriffiths & Steyvers (2004)

Generalization test

Non-parametric Bayesian statisticsInfinite models; models that grow with size of data

Teh, Jordan, Teal, & Blei (2004)Blei, Griffiths, Jordan, Tenenbaum (2004)

top related