supervised and relational topic models

58
Supervised and Relational Topic Models David M. Blei Department of Computer Science Princeton University October 5, 2009 Joint work with Jonathan Chang and Jon McAuliffe

Upload: perseid

Post on 23-Jan-2015

912 views

Category:

Education


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: supervised and relational topic models

Supervised and Relational Topic Models

David M. Blei

Department of Computer SciencePrinceton University

October 5, 2009

Joint work with Jonathan Chang and Jon McAuliffe

Page 2: supervised and relational topic models

Topic modeling

• Large electronic archives of document collections require newstatistical tools for analyzing text.

• Topic models have emerged as a powerful technique forunsupervised analysis of large document collections.

• Topic models posit latent topics in text using hidden randomvariables, and uncover that structure with posterior inference.

• Useful for tasks like browsing, search, information retrieval, etc.

Page 3: supervised and relational topic models

Examples of topic modeling

contractual employment female markets criminalexpectation industrial men earnings discretion

gain local women investors justicepromises jobs see sec civil

expectations employees sexual research processbreach relations note structure federal

enforcing unfair employer managers seesupra agreement discrimination firm officernote economic harassment risk parole

perform case gender large inmates

Page 4: supervised and relational topic models

Examples of topic modeling

the,ofa, isand

nalgorithm

timelog

bound

nfunctions

polynomiallog

algorithm

logicprogramssystemslanguage

sets

systemsystems

performanceanalysis

distributed

graphgraphsedge

minimumvertices

proofpropertyprogram

resolutionabstract

consensusobjects

messagesprotocol

asynchronous

networksqueuing

asymptoticproductform

server

approximations

pointsdistanceconvex

databaseconstraints

algebrabooleanrelational

formulasfirstorderdecisiontemporalqueries

treesregular

treesearch

compression

machinedomaindegreedegrees

polynomials

routingadaptivenetworknetworksprotocols

networksprotocolnetworkpackets

link

databasetransactions

retrievalconcurrencyrestrictions

learninglearnablestatisticalexamplesclasses

mmergingnetworkssorting

multiplication

constraintdependencies

localconsistency

tractable

logiclogicsquery

theorieslanguages

quantumautomata

ncautomatonlanguages

onlinescheduling

taskcompetitive

tasks

learningknowledgereasoningverification

circuit

An optimal algorithm for intersecting line segments in the planeRecontamination does not help to search a graphA new approach to the maximum-flow problemThe time complexity of maximum matching by simulated annealing

Quantum lower bounds by polynomialsOn the power of bounded concurrency I: finite automataDense quantum coding and quantum finite automataClassical physics and the Church--Turing Thesis

Nearly optimal algorithms and bounds for multilayer channel routingHow bad is selfish routing?Authoritative sources in a hyperlinked environmentBalanced sequences and optimal routing

Single-class bounds of multi-class queuing networksThe maximum concurrent flow problemContention in shared memory algorithmsLinear probing with a nonuniform address distribution

Magic Functions: In Memoriam: Bernard M. Dwork 1923--1998A mechanical proof of the Church-Rosser theoremTimed regular expressionsOn the power and limitations of strictness analysis

Module algebraOn XML integrity constraints in the presence of DTDsClosure properties of constraintsDynamic functional dependencies and database aging

Page 5: supervised and relational topic models

Examples of topic modeling

1880electric

machinepowerenginesteam

twomachines

ironbattery

wire

1890electricpower

companysteam

electricalmachine

twosystemmotorengine

1900apparatus

steampowerengine

engineeringwater

constructionengineer

roomfeet

1910air

waterengineeringapparatus

roomlaboratoryengineer

madegastube

1920apparatus

tubeair

pressurewaterglassgas

madelaboratorymercury

1930tube

apparatusglass

airmercury

laboratorypressure

madegas

small

1940air

tubeapparatus

glasslaboratory

rubberpressure

smallmercury

gas

1950tube

apparatusglass

airchamber

instrumentsmall

laboratorypressurerubber

1960tube

systemtemperature

airheat

chamberpowerhigh

instrumentcontrol

1970air

heatpowersystem

temperaturechamber

highflowtube

design

1980high

powerdesignheat

systemsystemsdevices

instrumentscontrollarge

1990materials

highpowercurrent

applicationstechnology

devicesdesigndeviceheat

2000devicesdevice

materialscurrent

gatehighlight

siliconmaterial

technology

Page 6: supervised and relational topic models

Examples of topic modeling

wild typemutant

mutationsmutantsmutation

plantsplantgenegenes

arabidopsis

p53cell cycleactivitycyclin

regulation

amino acidscdna

sequenceisolatedprotein

genedisease

mutationsfamiliesmutation

rnadna

rna polymerasecleavage

site

cellscell

expressioncell lines

bone marrow

united stateswomen

universitiesstudents

education

sciencescientists

saysresearchpeople

researchfundingsupport

nihprogram

surfacetip

imagesampledevice

laseropticallight

electronsquantum

materialsorganicpolymerpolymersmolecules

volcanicdepositsmagmaeruption

volcanism

mantlecrust

upper mantlemeteorites

ratios

earthquakeearthquakes

faultimages

dataancientfoundimpact

million years agoafrica

climateocean

icechanges

climate change

cellsproteins

researchersproteinfound

patientsdisease

treatmentdrugsclinical

geneticpopulationpopulationsdifferencesvariation

fossil recordbirds

fossilsdinosaurs

fossil

sequencesequences

genomedna

sequencing

bacteriabacterial

hostresistanceparasite

developmentembryos

drosophilagenes

expression

speciesforestforests

populationsecosystems

synapsesltp

glutamatesynapticneurons

neuronsstimulusmotorvisual

cortical

ozoneatmospheric

measurementsstratosphere

concentrations

sunsolar wind

earthplanetsplanet

co2carbon

carbon dioxidemethane

water

receptorreceptors

ligandligands

apoptosis

proteinsproteinbindingdomaindomains

activatedtyrosine phosphorylation

activationphosphorylation

kinase

magneticmagnetic field

spinsuperconductivitysuperconducting

physicistsparticlesphysicsparticle

experimentsurfaceliquid

surfacesfluid

model reactionreactionsmoleculemolecules

transition state

enzymeenzymes

ironactive sitereduction

pressurehigh pressure

pressurescore

inner core

brainmemorysubjects

lefttask

computerproblem

informationcomputersproblems

starsastronomers

universegalaxiesgalaxy

virushiv

aidsinfectionviruses

miceantigent cells

antigensimmune response

Page 7: supervised and relational topic models

Supervised topic models

• These applications of topic modeling work in the same way.

• Fit a model using a likelihood criterion. Then, hope that theresulting model is useful for the task at hand.

• Supervised topic models and relational topic models fittopics explicitly to perform prediction.

• Useful for building topic models that can• Predict the rating of a review• Predict the category of an image• Predict the links emitted from a document

Page 8: supervised and relational topic models

Outline

1 Unsupervised topic models

2 Supervised topic models

3 Relational topic models

Page 9: supervised and relational topic models

Probabilistic modeling

1 Treat data as observations that arise from a generativeprobabilistic process that includes hidden variables

• For documents, the hidden variables reflect the thematicstructure of the collection.

2 Infer the hidden structure using posterior inference• What are the topics that describe this collection?

3 Situate new data into the estimated model.• How does this query or new document fit into the estimated

topic structure?

Page 10: supervised and relational topic models

Intuition behind LDA

Simple intuition: Documents exhibit multiple topics.

Page 11: supervised and relational topic models

Generative model

gene 0.04dna 0.02genetic 0.01.,,

life 0.02evolve 0.01organism 0.01.,,

brain 0.04neuron 0.02nerve 0.01...

data 0.02number 0.02computer 0.01.,,

Topics Documents Topic proportions andassignments

• Each document is a random mixture of corpus-wide topics• Each word is drawn from one of those topics

Page 12: supervised and relational topic models

The posterior distribution

Topics Documents Topic proportions andassignments

• In reality, we only observe the documents• Our goal is to infer the underlying topic structure

Page 13: supervised and relational topic models

Latent Dirichlet allocation

θd Zd,n Wd,nN

D K!k

! !

Dirichletparameter

Per-documenttopic proportions

Per-wordtopic assignment

Observedword Topics

Topichyperparameter

Each piece of the structure is a random variable.

Page 14: supervised and relational topic models

Latent Dirichlet allocation

!d Zd,n Wd,nN

D K!k

! !

βk ∼ Dir(η) k = 1 . . .Kθd ∼ Dir(α) d = 1 . . .D

Zd ,n | θd ∼ Mult(1, θd ) d = 1 . . .D, n = 1 . . .NWd ,n | θd , zd ,n, β1:K ∼ Mult(1, βzd,n ) d = 1 . . .D, n = 1 . . .N

Page 15: supervised and relational topic models

Latent Dirichlet allocation

!d Zd,n Wd,nN

D K!k

! !

1 Draw each topic βk ∼ Dir(η), for k ∈ {1, . . . ,K}.2 For each document:

1 Draw topic proportions θd ∼ Dir(α).2 For each word:

1 Draw Zd ,n ∼ Mult(θd ).2 Draw Wd ,n ∼ Mult(βzd,n ).

Page 16: supervised and relational topic models

Latent Dirichlet allocation

!d Zd,n Wd,nN

D K!k

! !

• From a collection of documents, infer• Per-word topic assignment zd ,n• Per-document topic proportions θd• Per-corpus topic distributions βk

• Use posterior expectations to perform the task at hand, e.g.,information retrieval, document similarity, etc.

Page 17: supervised and relational topic models

Latent Dirichlet allocation

!d Zd,n Wd,nN

D K!k

! !

• Computing the posterior is intractable:

p(θ |α)∏N

n=1 p(zn | θ)p(wn | zn, β1:K )∫θ p(θ |α)

∏Nn=1

∑Kz=1 p(zn | θ)p(wn | zn, β1:K )

• Several approximation techniques have been developed.

Page 18: supervised and relational topic models

Latent Dirichlet allocation

!d Zd,n Wd,nN

D K!k

! !

• Mean field variational methods (Blei et al., 2001, 2003)• Expectation propagation (Minka and Lafferty, 2002)• Collapsed Gibbs sampling (Griffiths and Steyvers, 2002)• Collapsed variational inference (Teh et al., 2006)

Page 19: supervised and relational topic models

Example inference

1 8 16 26 36 46 56 66 76 86 96

Topics

Probability

0.0

0.1

0.2

0.3

0.4

Page 20: supervised and relational topic models

Example topics

“Genetics” “Evolution” “Disease” “Computers”

human evolution disease computergenome evolutionary host models

dna species bacteria informationgenetic organisms diseases datagenes life resistance computers

sequence origin bacterial systemgene biology new network

molecular groups strains systemssequencing phylogenetic control model

map living infectious parallelinformation diversity malaria methods

genetics group parasite networksmapping new parasites softwareproject two united new

sequences common tuberculosis simulations

Page 21: supervised and relational topic models

Used in exploratory tools of document collections

Page 22: supervised and relational topic models

LDA summary

• LDA is a powerful model for• Visualizing the hidden thematic structure in large corpora• Generalizing new data to fit into that structure

• LDA is a mixed membership model (Erosheva, 2004) that buildson the work of Deerwester et al. (1990) and Hofmann (1999).

• For document collections and other grouped data, this might bemore appropriate than a simple finite mixture.

• The same model was independently invented for populationgenetics analysis (Pritchard et al., 2000).

Page 23: supervised and relational topic models

LDA summary

• Modular : It can be embedded in more complicated models.• General : The data generating distribution can be changed.• Variational inference is fast; lets us to analyze large data sets.

• See Blei et al., 2003 for details and a quantitative comparison.See my web-site for code and other papers.

• Jonathan Chang’s excellent R package “lda” contains Gibbssampling code for this model and many others.

Page 24: supervised and relational topic models

Supervised topic models

• But LDA is an unsupervised model. How can we build a topicmodel that is good at the task we care about?

• Many data are paired with response variables.• User reviews paired with a number of stars• Web pages paired with a number of “diggs”• Documents paired with links to other documents• Images paired with a category

• Supervised topic models are topic models of documents andresponses, fit to find topics predictive of the response.

Page 25: supervised and relational topic models

Supervised LDA

!d Zd,n Wd,nN

D

K!k

!

Yd !, "2

1 Draw topic proportions θ |α ∼ Dir(α).2 For each word

• Draw topic assignment zn | θ ∼ Mult(θ).• Draw word wn | zn, β1:K ∼ Mult(βzn ).

3 Draw response variable y | z1:N , η, σ2 ∼ N

(η>z̄, σ2), where

z̄ = (1/N)∑N

n=1 zn.

Page 26: supervised and relational topic models

Supervised LDA

!d Zd,n Wd,nN

D

K!k

!

Yd !, "2

• The response variable y is drawn after the document because itdepends on z1:N , an assumption of partial exchangeability.

• Consequently, y is necessarily conditioned on the words.

• In a sense, this blends generative and discriminative modeling.

Page 27: supervised and relational topic models

Supervised LDA

!d Zd,n Wd,nN

D

K!k

!

Yd !, "2

• Given a set of document-response pairs, fit the modelparameters by maximum likelihood.

• Given a new document, compute a prediction of its response.

• Both of these activities hinge on variational inference.

Page 28: supervised and relational topic models

Variational inference (in general)

• Variational methods are a deterministic alternative to MCMC.

• Let x1:N be observations and z1:M be latent variables

• Our goal is to compute the posterior distribution

p(z1:M | x1:N) =p(z1:M , x1:N)∫

p(z1:M , x1:N)dz1:M

• For many interesting distributions, the marginal likelihood of theobservations is difficult to efficiently compute

Page 29: supervised and relational topic models

Variational inference

• Use Jensen’s inequality to bound the log prob of theobservations:

log p(x1:N) = log∫

p(z1:M , x1:N)dz1:M

= log∫

p(z1:M , x1:N)qν(z1:M)

qν(z1:M)dz1:M

≥ Eqν [log p(z1:M , x1:N)]− Eqν [log qν(z1:M)]

• We have introduced a distribution of the latent variables with freevariational parameters ν.

• We optimize those parameters to tighten this bound.

• This is the same as finding the member of the family qν that isclosest in KL divergence to p(z1:M | x1:N).

Page 30: supervised and relational topic models

Mean-field variational inference

• Factorization of qν determines complexity of optimization

• In mean field variational inference qν is fully factored

qν(z1:M) =M∏

m=1

qνm (zm).

• The latent variables are independent.• Each is governed by its own variational parameter νm.

• In the true posterior they can exhibit dependence(often, this is what makes exact inference difficult).

Page 31: supervised and relational topic models

MFVI and conditional exponential families

• Suppose the distribution of each latent variable conditional on allother variables is in the exponential family:

p(zm | z−m,x) = hm(zm) exp{gm(z−m,x)T zm − am(gi(z−m,x))}

• Assume qν is fully factorized, and each factor is in the sameexponential family as the corresponding conditional:

qνm (zm) = hm(zm) exp{νTmzm − am(νm)}

Page 32: supervised and relational topic models

MFVI and conditional exponential families

• Variational inference is the following coordinate ascent algorithm

νm = Eqν [gm(Z−m,x)]

• Notice the relationship to Gibbs sampling.

Page 33: supervised and relational topic models

Variational inference

• Alternative to MCMC; replace sampling with optimization.

• Deterministic approximation to posterior distribution.

• Uses established optimization methods(block coordinate ascent; Newton-Raphson; interior-point).

• Faster, more scalable than MCMC for large problems.

• Biased, whereas MCMC is not.

• Emerging as a useful framework for fully Bayesian and empiricalBayesian inference problems. Many open issues!

• Good papers: Beal’s Ph.D. thesis, Wainwright and Jordan (2009)

Page 34: supervised and relational topic models

Variational inference in sLDA

!d Zd,n Wd,nN

D

K!k

!

Yd !, "2

• In sLDA the variational bound is

E[log p(θ |α)] +∑N

n=1 E[log p(Zn | θ)]

+∑N

n=1 E[log p(wn |Zn, β1:K )] + E[log p(y |Z1:N , η, σ2)] + H(q)

• As in Blei, Ng, and Jordan (2003), we use the fully-factorizedvariational distribution

q(θ, z1:N | γ, φ1:N) = q(θ | γ)∏N

n=1 q(zn |φn),

Page 35: supervised and relational topic models

Variational inference in sLDA

• The distinguishing term is

E[log p(y |Z1:N , η, σ2)]

= −12

log(2πσ2)− y2 − 2yη>E

[Z̄]

+ η>E[Z̄ Z̄>

2σ2

• The first expectation is

E[Z̄]

= φ̄ := 1N∑N

n=1 φn.

• The second expectation is

E[Z̄ Z̄>

]= 1

N2

(∑Nn=1

∑m 6=n φnφ

>m +

∑Nn=1 diag{φn}

).

• Linear in φn, which leads to an easy coordinate ascent algorithm.

Page 36: supervised and relational topic models

Maximum likelihood estimation

• The M-step is an MLE under expected sufficient statistics.

• Define• y = y1:D is the response vector• A is the D × K matrix whose rows are Z̄>d .

• MLE of the coefficients solve the expected normal equations

E[A>A

]η = E[A]>y ⇒ η̂new ←

(E[A>A

])−1E[A]>y

• The MLE of the variance is

σ̂2new ← (1/D){y>y − y>E[A]

(E[A>A

])−1E[A]>y}

Page 37: supervised and relational topic models

Prediction

• We have fit SLDA parameters to a corpus, using variational EM.

• We have a new document w1:N with unknown response value.

• First, run variational inference in the unsupervised LDA model, toobtain γ and φ1:N for the new document.(LDA⇔ integrating unobserved Y out of SLDA.)

• Predict y using SLDA expected value:

E[Y |w1:N , α, β1:K , η, σ

2]≈ η>Eq

[Z̄]

= η>φ̄.

Page 38: supervised and relational topic models

Example: Movie reviews

bothmotionsimpleperfectfascinatingpowercomplex

howevercinematographyscreenplayperformancespictureseffectivepicture

histheircharactermanywhileperformancebetween

−30 −20 −10 0 10 20

● ●● ● ●● ● ● ● ●

morehasthanfilmsdirectorwillcharacters

onefromtherewhichwhomuchwhat

awfulfeaturingroutinedryofferedcharlieparis

notaboutmovieallwouldtheyits

havelikeyouwasjustsomeout

badguyswatchableitsnotonemovie

leastproblemunfortunatelysupposedworseflatdull

• 10-topic sLDA model on movie reviews (Pang and Lee, 2005).• Response: Number of stars associated with each review

• Each component of coefficient vector η is associated with a topic.

Page 39: supervised and relational topic models

Predictive R2

(SLDA is red.)

● ●● ● ● ●

●●

●● ●

●● ● ●

5 10 15 20 25 30 35 40 45 50

0.0

0.1

0.2

0.3

0.4

0.5

Number of topics

Pre

dict

ive

R2

Page 40: supervised and relational topic models

Held out likelihood

(SLDA is red.)

●● ●

● ●

5 10 15 20 25 30 35 40 45 50

−6.

42−

6.41

−6.

40−

6.39

−6.

38−

6.37

Number of topics

Per

−w

ord

held

out

log

likel

ihoo

d

Page 41: supervised and relational topic models

Diverse response types with GLMs

• Want to work with response variables that don’t live in the reals.• binary / multiclass classification• count data• waiting time

• Model the response response with a generalized linear model

p(y | ζ, δ) = h(y , δ) exp{ζy − A(ζ)

δ

},

where ζ = η>z̄.

• Complicates inference, but allows for flexible modeling.

Page 42: supervised and relational topic models

Example: Multi-class classification

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#318

CVPR#318

CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

highway

car, sign, road

coast (highway)

car, sand beach, tree

inside city

buildings, car, sidewalk

street (inside city)

window, tree, building

occluded

tall building

trees, buildings

occluded, window

inside city (tall building)

tree, car, sidewalk

street

tree, car, sidewalk

highway (street)

car, window, tree

forest

tree trunk, trees,

ground grass

mountain (forest)

snowy mountain, tree

trunk

coast

sand beach, cloud

open country (coast)

sea water, buildings

mountain

snowy mountain,

sea water, field

highway (mountain)

tree, snowy mountain

open country

cars, field,

sand beach

coast (open country)

tree, field, sea water

Incorrect classification (correct class)

with predicted annotations

Correct classification

with predicted annotations

Figure 4. Example results from the LabelMe dataset. For each class, left side contains examples with correct classification and predictedannotations, while right side contains wrong ones (the class label in the bracket is the right one) with the predicted annotations. The italicwords indicate the class label, while the normal words are associated predicted annotations.

[28] J. Vogel and B. Schiele. A semantic typicality measure fornatural scene categorization. In DAGM-Symposium, 2004. 5

[29] Y. Wang and S. Gong. Conditional random field for naturalscene categorization. In BMVC, 2007. 5

[30] Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-labellearning with application to scene classification. In NIPS,2006. 5

8

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#318

CVPR#318

CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

highway

car, sign, road

coast (highway)

car, sand beach, tree

inside city

buildings, car, sidewalk

street (inside city)

window, tree, building

occluded

tall building

trees, buildings

occluded, window

inside city (tall building)

tree, car, sidewalk

street

tree, car, sidewalk

highway (street)

car, window, tree

forest

tree trunk, trees,

ground grass

mountain (forest)

snowy mountain, tree

trunk

coast

sand beach, cloud

open country (coast)

sea water, buildings

mountain

snowy mountain,

sea water, field

highway (mountain)

tree, snowy mountain

open country

cars, field,

sand beach

coast (open country)

tree, field, sea water

Incorrect classification (correct class)

with predicted annotations

Correct classification

with predicted annotations

Figure 4. Example results from the LabelMe dataset. For each class, left side contains examples with correct classification and predictedannotations, while right side contains wrong ones (the class label in the bracket is the right one) with the predicted annotations. The italicwords indicate the class label, while the normal words are associated predicted annotations.

[28] J. Vogel and B. Schiele. A semantic typicality measure fornatural scene categorization. In DAGM-Symposium, 2004. 5

[29] Y. Wang and S. Gong. Conditional random field for naturalscene categorization. In BMVC, 2007. 5

[30] Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-labellearning with application to scene classification. In NIPS,2006. 5

8

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#318

CVPR#318

CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

highway

car, sign, road

coast (highway)

car, sand beach, tree

inside city

buildings, car, sidewalk

street (inside city)

window, tree, building

occluded

tall building

trees, buildings

occluded, window

inside city (tall building)

tree, car, sidewalk

street

tree, car, sidewalk

highway (street)

car, window, tree

forest

tree trunk, trees,

ground grass

mountain (forest)

snowy mountain, tree

trunk

coast

sand beach, cloud

open country (coast)

sea water, buildings

mountain

snowy mountain,

sea water, field

highway (mountain)

tree, snowy mountain

open country

cars, field,

sand beach

coast (open country)

tree, field, sea water

Incorrect classification (correct class)

with predicted annotations

Correct classification

with predicted annotations

Figure 4. Example results from the LabelMe dataset. For each class, left side contains examples with correct classification and predictedannotations, while right side contains wrong ones (the class label in the bracket is the right one) with the predicted annotations. The italicwords indicate the class label, while the normal words are associated predicted annotations.

[28] J. Vogel and B. Schiele. A semantic typicality measure fornatural scene categorization. In DAGM-Symposium, 2004. 5

[29] Y. Wang and S. Gong. Conditional random field for naturalscene categorization. In BMVC, 2007. 5

[30] Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-labellearning with application to scene classification. In NIPS,2006. 5

8

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#318

CVPR#318

CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

highway

car, sign, road

coast (highway)

car, sand beach, tree

inside city

buildings, car, sidewalk

street (inside city)

window, tree, building

occluded

tall building

trees, buildings

occluded, window

inside city (tall building)

tree, car, sidewalk

street

tree, car, sidewalk

highway (street)

car, window, tree

forest

tree trunk, trees,

ground grass

mountain (forest)

snowy mountain, tree

trunk

coast

sand beach, cloud

open country (coast)

sea water, buildings

mountain

snowy mountain,

sea water, field

highway (mountain)

tree, snowy mountain

open country

cars, field,

sand beach

coast (open country)

tree, field, sea water

Incorrect classification (correct class)

with predicted annotations

Correct classification

with predicted annotations

Figure 4. Example results from the LabelMe dataset. For each class, left side contains examples with correct classification and predictedannotations, while right side contains wrong ones (the class label in the bracket is the right one) with the predicted annotations. The italicwords indicate the class label, while the normal words are associated predicted annotations.

[28] J. Vogel and B. Schiele. A semantic typicality measure fornatural scene categorization. In DAGM-Symposium, 2004. 5

[29] Y. Wang and S. Gong. Conditional random field for naturalscene categorization. In BMVC, 2007. 5

[30] Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-labellearning with application to scene classification. In NIPS,2006. 5

8

540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593

594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647

CVPR#318

CVPR#318

CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

20 40 60 80 100 120

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

topics

ave

rag

e a

ccu

racy

image classification on the LabelMe dataset

multi!class sLDA with annotations multi!class sLDA Fei!Fei and Perona, 2005 Bosch et al., 2006

20 40 60 80 100 120

0.56

0.58

0.6

0.62

0.64

0.66

topics

ave

rag

e a

ccu

racy

image classification on the UIUC!Sport dataset

Figure 2. Comparisons of average accuracy over all classes based on 5 random train/test subsets. multi-class sLDA with annotations andmulti-class sLDA (red curves in color) are both our models. left. Accuracy as a function of the number of topics on the LabelMe dataset.right. Accuracy as a function of the number of topics on the UIUC-Sport dataset.

3. multi-class sLDA: This is the multi-class sLDA model,described in this paper.

4. multi-class sLDA with annotations: This is multi-classsLDA with annotations, described in this paper.

Note all testing is performed on unlabeled and unannotatedimages.

The results are illustrated in the graphs of Figure 2 andin the confusion matrices of Figure 3.2 Our models—multi-class sLDA and multi-class sLDA with annotations— per-form better than the other approaches. They reduce the errorof Fei-Fei and Perona, 2005 by at least 10% on both datasets, and even more for Bosch et al., 2006. This demon-strates that multi-class sLDA is a better classifier, and thatjoint modeling does not negatively affect classification ac-curacy when annotation information is available. In fact, itusually increases the accuracy.

Observe that the model of [5], unsupervised LDA com-bined with KNN, gives the worst performance of thesemethods. This highlights the difference between findingtopics that are predictive, as our models do, and findingtopics in an unsupervised way. The accuracy of unsuper-vised LDA might be increased by using some of the othervisual features suggested by [5]. Here, we restrict ourselvesto SIFT features in order to compare models, rather thanfeature sets.

As the number of topics increases, the multi-class sLDAmodels (with and without annotation) do not overfit untilaround 100 topics, while Fei-Fei and Perona, 2005 beginsto overfit at 40 topics. This suggests that multi-class sLDA,which combines aspects of both generative and discrimina-tive classification, can handle more latent features than a

2Other than the topic models listed, we also tested an SVM-based ap-proach using SIFT image features. The SVM yielded much worse perfor-mance than the topic models (47% for the LabelMe data, and 20% for theUIUC-Sport data). These are not marked on the plots.

purely generative approach. On one hand, a large numberof topics increases the possibility of overfitting; on the otherhand, it provides more latent features for building the clas-sifier.

Image Annotation. In the case of multi-class sLDA withannotations, we can use the same trained model for imageannotation. We emphasize that our models are designed forsimultaneous classification and annotation. For image an-notation, we compare following two methods,

1. Blei and Jordan, 2003: This is the corr-LDA modelfrom [2], trained on annotated images.

2. multi-class sLDA with annotations: This is exactly thesame model trained for image classification in the pre-vious section. In testing annotation, we observe onlyimages.

To measure image annotation performance, we use anevaluation measure from information retrieval. Specifi-cally, we examine the top-N F-measure3, denoted as F-measure@N , where we set N = 5. We find that multi-class sLDA with annotations performs slightly better thancorr-LDA over all the numbers of topics tested (about 1%relative improvement). For example, considering modelswith 100 topics, the LabelMe F-measures are 38.2% (corr-LDA) and 38.7% (multi-class sLDA with annotations); onUIUC-Sport, they are 34.7% (corr-LDA) and 35.0% (multi-class sLDA with annotations).

These results demonstrate that our models can performclassification and annotation with the same latent space.With a single trained model, we find the annotation per-formance that is competitive with the state-of-the-art, andclassification performance that is superior.

3F-measure is defined as 2 ! precision ! recall/(precision + recall).

6

# of components

(SLDA for image classification, with Chong Wang)

Page 43: supervised and relational topic models

Supervised topic models

• SLDA enables model-based regression where the predictor“variable” is a text document.

• It can easily be used wherever LDA is used in an unsupervisedfashion (e.g., images, genes, music).

• SLDA is a supervised dimension-reduction technique, whereasLDA performs unsupervised dimension reduction.

• LDA + regression compared to sLDA is like principal componentsregression compared to partial least squares.

• Paper: Blei and McAuliffe, NIPS 2007.

Page 44: supervised and relational topic models

Relational topic models

52

478

430

2487

75

288

1123

2122

2299

1354

1854

1855

89

635

92

2438

136

479

109

640

119686

120

1959

1539

147

172

177

965

911

2192

1489

885

178378

286

208

1569

2343

1270

218

1290

223

227

236

1617

254

1176

256

634

264

1963

2195

1377

303

426

2091

313

1642

534

801

335

344

585

1244

2291

2617

1627

2290

1275

375

1027

396

1678

2447

2583

1061 692

1207

960

1238

20121644

2042

381

418

1792

1284

651

524

1165

2197

1568

2593

1698

547 683

2137 1637

2557

2033632

1020

436442

449

474

649

2636

2300

539

541

603

1047

722

660

806

1121

1138

831837

1335

902

964

966

981

16731140

14811432

1253

1590

1060

992

994

1001

1010

1651

1578

1039

1040

1344

1345

1348

1355

14201089

1483

1188

1674

1680

2272

1285

1592

1234

1304

1317

1426

1695

1465

1743

1944

2259

2213

We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts...

Irrelevant features and the subset selection problem

In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible...

Learning with many irrelevant features

In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating...

Evaluation and selection of biases in machine learning

The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method...

Utilizing prior concepts for learning

The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile...

Improving tactical plans with genetic algorithms

Evolutionary learning methods have been found to be useful in several areas in the development of intelligent robots. In the approach described here, evolutionary...

An evolutionary approach to learning in robots

Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior...

Using a genetic algorithm to learn strategies for collision

avoidance and local navigation

...

...

...

...

...

...

...

...

...

...

• Many data sets contain connected observations.

• For example:• Citation networks of documents• Hyperlinked networks of web-pages.• Friend-connected social network profiles

Page 45: supervised and relational topic models

Relational topic models

52

478

430

2487

75

288

1123

2122

2299

1354

1854

1855

89

635

92

2438

136

479

109

640

119686

120

1959

1539

147

172

177

965

911

2192

1489

885

178378

286

208

1569

2343

1270

218

1290

223

227

236

1617

254

1176

256

634

264

1963

2195

1377

303

426

2091

313

1642

534

801

335

344

585

1244

2291

2617

1627

2290

1275

375

1027

396

1678

2447

2583

1061 692

1207

960

1238

20121644

2042

381

418

1792

1284

651

524

1165

2197

1568

2593

1698

547 683

2137 1637

2557

2033632

1020

436442

449

474

649

2636

2300

539

541

603

1047

722

660

806

1121

1138

831837

1335

902

964

966

981

16731140

14811432

1253

1590

1060

992

994

1001

1010

1651

1578

1039

1040

1344

1345

1348

1355

14201089

1483

1188

1674

1680

2272

1285

1592

1234

1304

1317

1426

1695

1465

1743

1944

2259

2213

We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts...

Irrelevant features and the subset selection problem

In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible...

Learning with many irrelevant features

In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating...

Evaluation and selection of biases in machine learning

The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method...

Utilizing prior concepts for learning

The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile...

Improving tactical plans with genetic algorithms

Evolutionary learning methods have been found to be useful in several areas in the development of intelligent robots. In the approach described here, evolutionary...

An evolutionary approach to learning in robots

Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior...

Using a genetic algorithm to learn strategies for collision

avoidance and local navigation

...

...

...

...

...

...

...

...

...

...

• Research has focused on finding communities and patterns inthe link-structure of these networks (Kemp et al. 2004, Hoff etal., 2002, Hofman and Wiggins 2007, Airoldi et al. 2008).

• By adapting supervised topic modeling, we can build a goodmodel of content and structure.

• RTMs find related hidden structure in both types of data.

Page 46: supervised and relational topic models

Relational topic models

α

Nd

θd

wd,n

zd,n

Kβk

yd,d'

η

Nd'

θd'

wd',n

zd',n

• Binary response variable with each pair of documents

• Adapt variational EM algorithm for sLDA with binary GLMresponse model (with different link probability functions).

• Allows predictions that are out of reach for traditional models.

Page 47: supervised and relational topic models

Predictive performance of one type given the other

5 10 15 20 25

!1

4.0

!1

3.8

!1

3.6

!1

3.4

Cora

Lin

k L

og

Lik

eli

ho

od

!

!

!

!

!

5 10 15 20 25

!3

60

0!

35

50

!3

50

0!

34

50

Number of topics

Wo

rd L

og

Lik

eli

ho

od

!

!

! !!

5 10 15 20 25

!1

1.8

!1

1.7

!1

1.6

!1

1.5

!1

1.4

WebKB

!

!

! !

!

5 10 15 20 25

!1

14

5!

11

40

!1

13

5!

11

30

Number of topics

!

!

!! !

!

RTM,, !!""

RTM,, !!e

LDA + Regression

Mixed!Membership

Unigram/Bernoulli

5 10 15 20 25

!1

3.6

!1

3.5

!1

3.4

!1

3.3

!1

3.2

PNAS

!

!

!

!

!

5 10 15 20 25

!2

97

0!

29

60

!2

95

0!

29

40

Number of topics

!

!

!

!

!

5 10 15 20 25

!1

4.0

!1

3.8

!1

3.6

!1

3.4

CoraL

ink

Lo

g L

ike

lih

oo

d

!

!

!

!

!

5 10 15 20 25

!3

60

0!

35

50

!3

50

0!

34

50

Number of topics

Wo

rd L

og

Lik

eli

ho

od

!

!

! !!

5 10 15 20 25

!1

1.8

!1

1.7

!1

1.6

!1

1.5

!1

1.4

WebKB

!

!

! !

!

5 10 15 20 25

!1

14

5!

11

40

!1

13

5!

11

30

Number of topics

!

!

!! !

!

RTM,, !!""

RTM,, !!e

LDA + Regression

Mixed!Membership

Unigram/Bernoulli

5 10 15 20 25

!1

3.6

!1

3.5

!1

3.4

!1

3.3

!1

3.2

PNAS

!

!

!

!

!

5 10 15 20 25

!2

97

0!

29

60

!2

95

0!

29

40

Number of topics

!

!

!

!

!

Link

log

likel

ihoo

d

Wor

d lo

g lik

elih

ood

# Topics # Topics

Cora corpus (McCallum et al., 2000)Note: Traditional models of networks cannot perform these tasks.

Page 48: supervised and relational topic models

Predictive performance of one type given the other

5 10 15 20 25

!14.0

!13.8

!13.6

!13.4

Cora

Lin

k L

og

Lik

elih

oo

d

!

!

!

!

!

5 10 15 20 25

!3600

!3550

!3500

!3450

Number of topics

Wo

rd L

og

Lik

elih

oo

d

!

!

! !!

5 10 15 20 25

!11.8

!11.7

!11.6

!11.5

!11.4

WebKB

!

!

! !

!

5 10 15 20 25

!1145

!1140

!1135

!1130

Number of topics

!

!

!! !

!

RTM,, !!""

RTM,, !!e

LDA + Regression

Mixed!Membership

Unigram/Bernoulli

5 10 15 20 25

!13.6

!13.5

!13.4

!13.3

!13.2

PNAS

!

!

!

!

!

5 10 15 20 25

!2970

!2960

!2950

!2940

Number of topics

!

!

!

!

!

5 10 15 20 25

!14.0

!13.8

!13.6

!13.4

Cora

Lin

k L

og

Lik

elih

oo

d

!

!

!

!

!

5 10 15 20 25

!3600

!3550

!3500

!3450

Number of topics

Wo

rd L

og

Lik

elih

oo

d

!

!

! !!

5 10 15 20 25

!11.8

!11.7

!11.6

!11.5

!11.4

WebKB

!

!

! !

!

5 10 15 20 25

!1145

!1140

!1135

!1130

Number of topics

!

!

!! !

!

RTM,, !!""

RTM,, !!e

LDA + Regression

Mixed!Membership

Unigram/Bernoulli

5 10 15 20 25

!13.6

!13.5

!13.4

!13.3

!13.2

PNAS

!

!

!

!

!

5 10 15 20 25

!2970

!2960

!2950

!2940

Number of topics

!

!

!

!

!

Link

log

likel

ihoo

d

Wor

d lo

g lik

elih

ood

# Topics # Topics

WebKB corpus (Craven et al., 1998)

Page 49: supervised and relational topic models

Predictive performance of one type given the other

5 10 15 20 25

!1

4.0

!1

3.8

!1

3.6

!1

3.4

Cora

Lin

k L

og

Lik

eli

ho

od

!

!

!

!

!

5 10 15 20 25

!3

60

0!

35

50

!3

50

0!

34

50

Number of topics

Wo

rd L

og

Lik

eli

ho

od

!

!

! !!

5 10 15 20 25

!1

1.8

!1

1.7

!1

1.6

!1

1.5

!1

1.4

WebKB

!

!

! !

!

5 10 15 20 25

!1

14

5!

11

40

!1

13

5!

11

30

Number of topics

!

!

!! !

!

RTM,, !!""

RTM,, !!e

LDA + Regression

Mixed!Membership

Unigram/Bernoulli

5 10 15 20 25

!1

3.6

!1

3.5

!1

3.4

!1

3.3

!1

3.2

PNAS

!

!

!

!

!

5 10 15 20 25

!2

97

0!

29

60

!2

95

0!

29

40

Number of topics

!

!

!

!

!

5 10 15 20 25

!1

4.0

!1

3.8

!1

3.6

!1

3.4

Cora

Lin

k L

og

Lik

eli

ho

od

!

!

!

!

!

5 10 15 20 25

!3

60

0!

35

50

!3

50

0!

34

50

Number of topics

Wo

rd L

og

Lik

eli

ho

od

!

!

! !!

5 10 15 20 25

!1

1.8

!1

1.7

!1

1.6

!1

1.5

!1

1.4

WebKB

!

!

! !

!

5 10 15 20 25

!1

14

5!

11

40

!1

13

5!

11

30

Number of topics

!

!

!! !

!

RTM,, !!""

RTM,, !!e

LDA + Regression

Mixed!Membership

Unigram/Bernoulli

5 10 15 20 25

!1

3.6

!1

3.5

!1

3.4

!1

3.3

!1

3.2

PNAS

!

!

!

!

!

5 10 15 20 25

!2

97

0!

29

60

!2

95

0!

29

40

Number of topics

!

!

!

!

!

Link

log

likel

ihoo

d

Wor

d lo

g lik

elih

ood

# Topics # Topics

PNAS corpus (courtesy of JSTOR)

Page 50: supervised and relational topic models

Predicting links from documents

16 J. CHANG AND D. BLEI

Table 2Top eight link predictions made by RTM (!e) and LDA + Regression for two documents(italicized) from Cora. The models were fit with 10 topics. Boldfaced titles indicate actual

documents cited by or citing each document. Over the whole corpus, RTM improvesprecision over LDA + Regression by 80% when evaluated on the first 20 documents

retrieved.

Markov chain Monte Carlo convergence diagnostics: A comparative review

Minorization conditions and convergence rates for Markov chain Monte Carlo

RT

M(!

e)

Rates of convergence of the Hastings and Metropolis algorithmsPossible biases induced by MCMC convergence diagnostics

Bounding convergence time of the Gibbs sampler in Bayesian image restorationSelf regenerative Markov chain Monte Carlo

Auxiliary variable methods for Markov chain Monte Carlo with applicationsRate of Convergence of the Gibbs Sampler by Gaussian Approximation

Diagnosing convergence of Markov chain Monte Carlo algorithms

Exact Bound for the Convergence of Metropolis Chains LD

A+

Regressio

n

Self regenerative Markov chain Monte CarloMinorization conditions and convergence rates for Markov chain Monte Carlo

Gibbs-markov modelsAuxiliary variable methods for Markov chain Monte Carlo with applications

Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical ModelsMediating instrumental variables

A qualitative framework for probabilistic inferenceAdaptation for Self Regenerative MCMC

Competitive environments evolve better solutions for complex tasks

Coevolving High Level Representations

RT

M(!

e)

A Survey of Evolutionary StrategiesGenetic Algorithms in Search, Optimization and Machine Learning

Strongly typed genetic programming in evolving cooperation strategiesSolving combinatorial problems using evolutionary algorithms

A promising genetic algorithm approach to job-shop scheduling. . .Evolutionary Module Acquisition

An Empirical Investigation of Multi-Parent Recombination Operators. . .

A New Algorithm for DNA Sequence Assembly

LD

A+

Regressio

n

Identification of protein coding regions in genomic DNASolving combinatorial problems using evolutionary algorithms

A promising genetic algorithm approach to job-shop scheduling. . .A genetic algorithm for passive management

The Performance of a Genetic Algorithm on a Chaotic Objective FunctionAdaptive global optimization with local search

Mutation rates as adaptations

Table 2 illustrates suggested citations using RTM (!e) and LDA + Regres-sion as predictive models. These suggestions were computed from a model fiton one of the folds of the Cora data. The top results illustrate suggested linksfor “Markov chain Monte Carlo convergence diagnostics: A comparative re-

Given a new document, which documents is it likely to link to?

Page 51: supervised and relational topic models

Predicting links from documents

16 J. CHANG AND D. BLEI

Table 2Top eight link predictions made by RTM (!e) and LDA + Regression for two documents(italicized) from Cora. The models were fit with 10 topics. Boldfaced titles indicate actual

documents cited by or citing each document. Over the whole corpus, RTM improvesprecision over LDA + Regression by 80% when evaluated on the first 20 documents

retrieved.

Markov chain Monte Carlo convergence diagnostics: A comparative review

Minorization conditions and convergence rates for Markov chain Monte Carlo

RT

M(!

e)

Rates of convergence of the Hastings and Metropolis algorithmsPossible biases induced by MCMC convergence diagnostics

Bounding convergence time of the Gibbs sampler in Bayesian image restorationSelf regenerative Markov chain Monte Carlo

Auxiliary variable methods for Markov chain Monte Carlo with applicationsRate of Convergence of the Gibbs Sampler by Gaussian Approximation

Diagnosing convergence of Markov chain Monte Carlo algorithms

Exact Bound for the Convergence of Metropolis Chains LD

A+

Regressio

n

Self regenerative Markov chain Monte CarloMinorization conditions and convergence rates for Markov chain Monte Carlo

Gibbs-markov modelsAuxiliary variable methods for Markov chain Monte Carlo with applications

Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical ModelsMediating instrumental variables

A qualitative framework for probabilistic inferenceAdaptation for Self Regenerative MCMC

Competitive environments evolve better solutions for complex tasks

Coevolving High Level Representations

RT

M(!

e)

A Survey of Evolutionary StrategiesGenetic Algorithms in Search, Optimization and Machine Learning

Strongly typed genetic programming in evolving cooperation strategiesSolving combinatorial problems using evolutionary algorithms

A promising genetic algorithm approach to job-shop scheduling. . .Evolutionary Module Acquisition

An Empirical Investigation of Multi-Parent Recombination Operators. . .

A New Algorithm for DNA Sequence Assembly

LD

A+

Regressio

n

Identification of protein coding regions in genomic DNASolving combinatorial problems using evolutionary algorithms

A promising genetic algorithm approach to job-shop scheduling. . .A genetic algorithm for passive management

The Performance of a Genetic Algorithm on a Chaotic Objective FunctionAdaptive global optimization with local search

Mutation rates as adaptations

Table 2 illustrates suggested citations using RTM (!e) and LDA + Regres-sion as predictive models. These suggestions were computed from a model fiton one of the folds of the Cora data. The top results illustrate suggested linksfor “Markov chain Monte Carlo convergence diagnostics: A comparative re-

Given a new document, which documents is it likely to link to?

Page 52: supervised and relational topic models

Spatially consistent topics18 J. CHANG AND D. BLEI

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.

18 J. CHANG AND D. BLEI

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.

18 J. CHANG AND D. BLEI

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.

18 J. CHANG AND D. BLEI

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.

18 J. CHANG AND D. BLEI

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.

• Links are the adjacency matrix of states

• Documents are geographically-tagged news articles from Yahoo!Not exchangeable.

• RTM finds spatially consistent topics.

Page 53: supervised and relational topic models

Summary

• Relational topic modeling allows us to analyze connecteddocuments, or other data for which the mixed-membershipassumptions are appropriate.

• Traditional models cannot predict with new and unlinked data.

• RTMs allow for such predictions• links given the new words of a document• words given the links of a new document

• Paper: Chang and Blei, AISTATS 2009.

Page 54: supervised and relational topic models

JSTOR Discipline Analysis

• Another kind of “supervised” topic model is where the topicshave external meaning (Rammage and Manning, 2009)

• JSTOR attaches each journal to a discipline

• E.g., JASA is in Statistics; PNAS is in General Science

• We can how interdisciplinary the articles are• Compute a distribution over terms for each discipline• Find topic proportions for each article

• (Work done by Sean Gerrish)

Page 55: supervised and relational topic models

Frances Stokes Berry;William D. Berry.

American Journal of Political Science

(1992), pp. 715-742

Journal Disciplines:

Political Science

Blake LeBaron. Philosophical

Transactions: Physical Sciences and

Engineering (1994), pp. 397-404

Journal Disciplines:

Mathematics

Biological Sciences

General Science

Gerald Marwell;Pamela Oliver. Social

Psychology Quarterly (1994), pp. 373

Journal Disciplines:

Psychology

Sociology

Kun Y. Park. Journal of Peace Research

(1993), pp. 79-93

Journal Disciplines:

Political Science

Herbert Hovenkamp. California Law

Review (1990), pp. 815-852

Journal Disciplines:

Law

Virginia Gray;David Lowery. Political

Research Quarterly (1993), pp. 81-97

Journal Disciplines:

Political Science

Use the sliders to adjust the

discipline weights. You can select

inactive disciplines as well.

Finance

Business

Statistics

Economics

Philosophy

Psychology

History of Science & Technology

African American Studies

African Studies

American Indian Studies

Anthropology

Aquatic Sciences

Archaeology

Architecture & Architectural History

Art & Art History

Asian Studies

Biological Sciences

empirical Search

The current index is only a small sample of JSTOR's collections.

Index size: 18032 documents

Showing 25 of 999 results.

Tax Innovation in the States: Capitalizing on Political Opportunity

Chaos and Nonlinear Forecastability in Economics and Finance

Reply: Theory Is Not a Social Dilemma

'Pouring New Wine into Fresh Wineskins': Defense Spending and Economic Growth in

LDCs with Application to South Korea

Positivism in Law & Economics

The Diversity of State Interest Group Systems

Discipline Browser http://dbrowser.jstor.org/browser.cgi?q=empirical+bayes&btnG...

2 of 6 10/4/09 10:37 AM

http://dbrowser.jstor.org

Page 56: supervised and relational topic models

General Science over Time

Year

log.

prop

0

1

2

3

4

5

6

7

PNAS

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

● ●●

● ●

● ●

●●

●●

●●

●●

●●

●●

1920 1940 1960 1980 2000

Science

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

1880 1900 1920 1940 1960 1980 2000

variable● Biological.Sciences● Botany...Plant.Sciences● Developmental...Cell.Biology● Ecology...Evolutionary.Biology● General.Science● Geography● History.of.Science...Technology● Mathematics

Page 57: supervised and relational topic models

Navel Gazing

Year

log.

prop

0

2

4

6

8

Journal of the American Statistical Association

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

1940 1960 1980 2000

The Annals of Statistics

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

1975 1980 1985 1990 1995 2000 2005

variable● Business● Economics● Finance● Mathematics● Statistics

Page 58: supervised and relational topic models

“We should seek out unfamiliar summaries of observational material,and establish their useful properties... And still more novelty cancome from finding, and evading, still deeper lying constraints.”

(John Tukey, The Future of Data Analysis, 1962)