supervised and relational topic models
DESCRIPTION
TRANSCRIPT
Supervised and Relational Topic Models
David M. Blei
Department of Computer SciencePrinceton University
October 5, 2009
Joint work with Jonathan Chang and Jon McAuliffe
Topic modeling
• Large electronic archives of document collections require newstatistical tools for analyzing text.
• Topic models have emerged as a powerful technique forunsupervised analysis of large document collections.
• Topic models posit latent topics in text using hidden randomvariables, and uncover that structure with posterior inference.
• Useful for tasks like browsing, search, information retrieval, etc.
Examples of topic modeling
contractual employment female markets criminalexpectation industrial men earnings discretion
gain local women investors justicepromises jobs see sec civil
expectations employees sexual research processbreach relations note structure federal
enforcing unfair employer managers seesupra agreement discrimination firm officernote economic harassment risk parole
perform case gender large inmates
Examples of topic modeling
the,ofa, isand
nalgorithm
timelog
bound
nfunctions
polynomiallog
algorithm
logicprogramssystemslanguage
sets
systemsystems
performanceanalysis
distributed
graphgraphsedge
minimumvertices
proofpropertyprogram
resolutionabstract
consensusobjects
messagesprotocol
asynchronous
networksqueuing
asymptoticproductform
server
approximations
pointsdistanceconvex
databaseconstraints
algebrabooleanrelational
formulasfirstorderdecisiontemporalqueries
treesregular
treesearch
compression
machinedomaindegreedegrees
polynomials
routingadaptivenetworknetworksprotocols
networksprotocolnetworkpackets
link
databasetransactions
retrievalconcurrencyrestrictions
learninglearnablestatisticalexamplesclasses
mmergingnetworkssorting
multiplication
constraintdependencies
localconsistency
tractable
logiclogicsquery
theorieslanguages
quantumautomata
ncautomatonlanguages
onlinescheduling
taskcompetitive
tasks
learningknowledgereasoningverification
circuit
An optimal algorithm for intersecting line segments in the planeRecontamination does not help to search a graphA new approach to the maximum-flow problemThe time complexity of maximum matching by simulated annealing
Quantum lower bounds by polynomialsOn the power of bounded concurrency I: finite automataDense quantum coding and quantum finite automataClassical physics and the Church--Turing Thesis
Nearly optimal algorithms and bounds for multilayer channel routingHow bad is selfish routing?Authoritative sources in a hyperlinked environmentBalanced sequences and optimal routing
Single-class bounds of multi-class queuing networksThe maximum concurrent flow problemContention in shared memory algorithmsLinear probing with a nonuniform address distribution
Magic Functions: In Memoriam: Bernard M. Dwork 1923--1998A mechanical proof of the Church-Rosser theoremTimed regular expressionsOn the power and limitations of strictness analysis
Module algebraOn XML integrity constraints in the presence of DTDsClosure properties of constraintsDynamic functional dependencies and database aging
Examples of topic modeling
1880electric
machinepowerenginesteam
twomachines
ironbattery
wire
1890electricpower
companysteam
electricalmachine
twosystemmotorengine
1900apparatus
steampowerengine
engineeringwater
constructionengineer
roomfeet
1910air
waterengineeringapparatus
roomlaboratoryengineer
madegastube
1920apparatus
tubeair
pressurewaterglassgas
madelaboratorymercury
1930tube
apparatusglass
airmercury
laboratorypressure
madegas
small
1940air
tubeapparatus
glasslaboratory
rubberpressure
smallmercury
gas
1950tube
apparatusglass
airchamber
instrumentsmall
laboratorypressurerubber
1960tube
systemtemperature
airheat
chamberpowerhigh
instrumentcontrol
1970air
heatpowersystem
temperaturechamber
highflowtube
design
1980high
powerdesignheat
systemsystemsdevices
instrumentscontrollarge
1990materials
highpowercurrent
applicationstechnology
devicesdesigndeviceheat
2000devicesdevice
materialscurrent
gatehighlight
siliconmaterial
technology
Examples of topic modeling
wild typemutant
mutationsmutantsmutation
plantsplantgenegenes
arabidopsis
p53cell cycleactivitycyclin
regulation
amino acidscdna
sequenceisolatedprotein
genedisease
mutationsfamiliesmutation
rnadna
rna polymerasecleavage
site
cellscell
expressioncell lines
bone marrow
united stateswomen
universitiesstudents
education
sciencescientists
saysresearchpeople
researchfundingsupport
nihprogram
surfacetip
imagesampledevice
laseropticallight
electronsquantum
materialsorganicpolymerpolymersmolecules
volcanicdepositsmagmaeruption
volcanism
mantlecrust
upper mantlemeteorites
ratios
earthquakeearthquakes
faultimages
dataancientfoundimpact
million years agoafrica
climateocean
icechanges
climate change
cellsproteins
researchersproteinfound
patientsdisease
treatmentdrugsclinical
geneticpopulationpopulationsdifferencesvariation
fossil recordbirds
fossilsdinosaurs
fossil
sequencesequences
genomedna
sequencing
bacteriabacterial
hostresistanceparasite
developmentembryos
drosophilagenes
expression
speciesforestforests
populationsecosystems
synapsesltp
glutamatesynapticneurons
neuronsstimulusmotorvisual
cortical
ozoneatmospheric
measurementsstratosphere
concentrations
sunsolar wind
earthplanetsplanet
co2carbon
carbon dioxidemethane
water
receptorreceptors
ligandligands
apoptosis
proteinsproteinbindingdomaindomains
activatedtyrosine phosphorylation
activationphosphorylation
kinase
magneticmagnetic field
spinsuperconductivitysuperconducting
physicistsparticlesphysicsparticle
experimentsurfaceliquid
surfacesfluid
model reactionreactionsmoleculemolecules
transition state
enzymeenzymes
ironactive sitereduction
pressurehigh pressure
pressurescore
inner core
brainmemorysubjects
lefttask
computerproblem
informationcomputersproblems
starsastronomers
universegalaxiesgalaxy
virushiv
aidsinfectionviruses
miceantigent cells
antigensimmune response
Supervised topic models
• These applications of topic modeling work in the same way.
• Fit a model using a likelihood criterion. Then, hope that theresulting model is useful for the task at hand.
• Supervised topic models and relational topic models fittopics explicitly to perform prediction.
• Useful for building topic models that can• Predict the rating of a review• Predict the category of an image• Predict the links emitted from a document
Outline
1 Unsupervised topic models
2 Supervised topic models
3 Relational topic models
Probabilistic modeling
1 Treat data as observations that arise from a generativeprobabilistic process that includes hidden variables
• For documents, the hidden variables reflect the thematicstructure of the collection.
2 Infer the hidden structure using posterior inference• What are the topics that describe this collection?
3 Situate new data into the estimated model.• How does this query or new document fit into the estimated
topic structure?
Intuition behind LDA
Simple intuition: Documents exhibit multiple topics.
Generative model
gene 0.04dna 0.02genetic 0.01.,,
life 0.02evolve 0.01organism 0.01.,,
brain 0.04neuron 0.02nerve 0.01...
data 0.02number 0.02computer 0.01.,,
Topics Documents Topic proportions andassignments
• Each document is a random mixture of corpus-wide topics• Each word is drawn from one of those topics
The posterior distribution
Topics Documents Topic proportions andassignments
• In reality, we only observe the documents• Our goal is to infer the underlying topic structure
Latent Dirichlet allocation
θd Zd,n Wd,nN
D K!k
! !
Dirichletparameter
Per-documenttopic proportions
Per-wordtopic assignment
Observedword Topics
Topichyperparameter
Each piece of the structure is a random variable.
Latent Dirichlet allocation
!d Zd,n Wd,nN
D K!k
! !
βk ∼ Dir(η) k = 1 . . .Kθd ∼ Dir(α) d = 1 . . .D
Zd ,n | θd ∼ Mult(1, θd ) d = 1 . . .D, n = 1 . . .NWd ,n | θd , zd ,n, β1:K ∼ Mult(1, βzd,n ) d = 1 . . .D, n = 1 . . .N
Latent Dirichlet allocation
!d Zd,n Wd,nN
D K!k
! !
1 Draw each topic βk ∼ Dir(η), for k ∈ {1, . . . ,K}.2 For each document:
1 Draw topic proportions θd ∼ Dir(α).2 For each word:
1 Draw Zd ,n ∼ Mult(θd ).2 Draw Wd ,n ∼ Mult(βzd,n ).
Latent Dirichlet allocation
!d Zd,n Wd,nN
D K!k
! !
• From a collection of documents, infer• Per-word topic assignment zd ,n• Per-document topic proportions θd• Per-corpus topic distributions βk
• Use posterior expectations to perform the task at hand, e.g.,information retrieval, document similarity, etc.
Latent Dirichlet allocation
!d Zd,n Wd,nN
D K!k
! !
• Computing the posterior is intractable:
p(θ |α)∏N
n=1 p(zn | θ)p(wn | zn, β1:K )∫θ p(θ |α)
∏Nn=1
∑Kz=1 p(zn | θ)p(wn | zn, β1:K )
• Several approximation techniques have been developed.
Latent Dirichlet allocation
!d Zd,n Wd,nN
D K!k
! !
• Mean field variational methods (Blei et al., 2001, 2003)• Expectation propagation (Minka and Lafferty, 2002)• Collapsed Gibbs sampling (Griffiths and Steyvers, 2002)• Collapsed variational inference (Teh et al., 2006)
Example inference
1 8 16 26 36 46 56 66 76 86 96
Topics
Probability
0.0
0.1
0.2
0.3
0.4
Example topics
“Genetics” “Evolution” “Disease” “Computers”
human evolution disease computergenome evolutionary host models
dna species bacteria informationgenetic organisms diseases datagenes life resistance computers
sequence origin bacterial systemgene biology new network
molecular groups strains systemssequencing phylogenetic control model
map living infectious parallelinformation diversity malaria methods
genetics group parasite networksmapping new parasites softwareproject two united new
sequences common tuberculosis simulations
Used in exploratory tools of document collections
LDA summary
• LDA is a powerful model for• Visualizing the hidden thematic structure in large corpora• Generalizing new data to fit into that structure
• LDA is a mixed membership model (Erosheva, 2004) that buildson the work of Deerwester et al. (1990) and Hofmann (1999).
• For document collections and other grouped data, this might bemore appropriate than a simple finite mixture.
• The same model was independently invented for populationgenetics analysis (Pritchard et al., 2000).
LDA summary
• Modular : It can be embedded in more complicated models.• General : The data generating distribution can be changed.• Variational inference is fast; lets us to analyze large data sets.
• See Blei et al., 2003 for details and a quantitative comparison.See my web-site for code and other papers.
• Jonathan Chang’s excellent R package “lda” contains Gibbssampling code for this model and many others.
Supervised topic models
• But LDA is an unsupervised model. How can we build a topicmodel that is good at the task we care about?
• Many data are paired with response variables.• User reviews paired with a number of stars• Web pages paired with a number of “diggs”• Documents paired with links to other documents• Images paired with a category
• Supervised topic models are topic models of documents andresponses, fit to find topics predictive of the response.
Supervised LDA
!d Zd,n Wd,nN
D
K!k
!
Yd !, "2
1 Draw topic proportions θ |α ∼ Dir(α).2 For each word
• Draw topic assignment zn | θ ∼ Mult(θ).• Draw word wn | zn, β1:K ∼ Mult(βzn ).
3 Draw response variable y | z1:N , η, σ2 ∼ N
(η>z̄, σ2), where
z̄ = (1/N)∑N
n=1 zn.
Supervised LDA
!d Zd,n Wd,nN
D
K!k
!
Yd !, "2
• The response variable y is drawn after the document because itdepends on z1:N , an assumption of partial exchangeability.
• Consequently, y is necessarily conditioned on the words.
• In a sense, this blends generative and discriminative modeling.
Supervised LDA
!d Zd,n Wd,nN
D
K!k
!
Yd !, "2
• Given a set of document-response pairs, fit the modelparameters by maximum likelihood.
• Given a new document, compute a prediction of its response.
• Both of these activities hinge on variational inference.
Variational inference (in general)
• Variational methods are a deterministic alternative to MCMC.
• Let x1:N be observations and z1:M be latent variables
• Our goal is to compute the posterior distribution
p(z1:M | x1:N) =p(z1:M , x1:N)∫
p(z1:M , x1:N)dz1:M
• For many interesting distributions, the marginal likelihood of theobservations is difficult to efficiently compute
Variational inference
• Use Jensen’s inequality to bound the log prob of theobservations:
log p(x1:N) = log∫
p(z1:M , x1:N)dz1:M
= log∫
p(z1:M , x1:N)qν(z1:M)
qν(z1:M)dz1:M
≥ Eqν [log p(z1:M , x1:N)]− Eqν [log qν(z1:M)]
• We have introduced a distribution of the latent variables with freevariational parameters ν.
• We optimize those parameters to tighten this bound.
• This is the same as finding the member of the family qν that isclosest in KL divergence to p(z1:M | x1:N).
Mean-field variational inference
• Factorization of qν determines complexity of optimization
• In mean field variational inference qν is fully factored
qν(z1:M) =M∏
m=1
qνm (zm).
• The latent variables are independent.• Each is governed by its own variational parameter νm.
• In the true posterior they can exhibit dependence(often, this is what makes exact inference difficult).
MFVI and conditional exponential families
• Suppose the distribution of each latent variable conditional on allother variables is in the exponential family:
p(zm | z−m,x) = hm(zm) exp{gm(z−m,x)T zm − am(gi(z−m,x))}
• Assume qν is fully factorized, and each factor is in the sameexponential family as the corresponding conditional:
qνm (zm) = hm(zm) exp{νTmzm − am(νm)}
MFVI and conditional exponential families
• Variational inference is the following coordinate ascent algorithm
νm = Eqν [gm(Z−m,x)]
• Notice the relationship to Gibbs sampling.
Variational inference
• Alternative to MCMC; replace sampling with optimization.
• Deterministic approximation to posterior distribution.
• Uses established optimization methods(block coordinate ascent; Newton-Raphson; interior-point).
• Faster, more scalable than MCMC for large problems.
• Biased, whereas MCMC is not.
• Emerging as a useful framework for fully Bayesian and empiricalBayesian inference problems. Many open issues!
• Good papers: Beal’s Ph.D. thesis, Wainwright and Jordan (2009)
Variational inference in sLDA
!d Zd,n Wd,nN
D
K!k
!
Yd !, "2
• In sLDA the variational bound is
E[log p(θ |α)] +∑N
n=1 E[log p(Zn | θ)]
+∑N
n=1 E[log p(wn |Zn, β1:K )] + E[log p(y |Z1:N , η, σ2)] + H(q)
• As in Blei, Ng, and Jordan (2003), we use the fully-factorizedvariational distribution
q(θ, z1:N | γ, φ1:N) = q(θ | γ)∏N
n=1 q(zn |φn),
Variational inference in sLDA
• The distinguishing term is
E[log p(y |Z1:N , η, σ2)]
= −12
log(2πσ2)− y2 − 2yη>E
[Z̄]
+ η>E[Z̄ Z̄>
]η
2σ2
• The first expectation is
E[Z̄]
= φ̄ := 1N∑N
n=1 φn.
• The second expectation is
E[Z̄ Z̄>
]= 1
N2
(∑Nn=1
∑m 6=n φnφ
>m +
∑Nn=1 diag{φn}
).
• Linear in φn, which leads to an easy coordinate ascent algorithm.
Maximum likelihood estimation
• The M-step is an MLE under expected sufficient statistics.
• Define• y = y1:D is the response vector• A is the D × K matrix whose rows are Z̄>d .
• MLE of the coefficients solve the expected normal equations
E[A>A
]η = E[A]>y ⇒ η̂new ←
(E[A>A
])−1E[A]>y
• The MLE of the variance is
σ̂2new ← (1/D){y>y − y>E[A]
(E[A>A
])−1E[A]>y}
Prediction
• We have fit SLDA parameters to a corpus, using variational EM.
• We have a new document w1:N with unknown response value.
• First, run variational inference in the unsupervised LDA model, toobtain γ and φ1:N for the new document.(LDA⇔ integrating unobserved Y out of SLDA.)
• Predict y using SLDA expected value:
E[Y |w1:N , α, β1:K , η, σ
2]≈ η>Eq
[Z̄]
= η>φ̄.
Example: Movie reviews
bothmotionsimpleperfectfascinatingpowercomplex
howevercinematographyscreenplayperformancespictureseffectivepicture
histheircharactermanywhileperformancebetween
−30 −20 −10 0 10 20
● ●● ● ●● ● ● ● ●
morehasthanfilmsdirectorwillcharacters
onefromtherewhichwhomuchwhat
awfulfeaturingroutinedryofferedcharlieparis
notaboutmovieallwouldtheyits
havelikeyouwasjustsomeout
badguyswatchableitsnotonemovie
leastproblemunfortunatelysupposedworseflatdull
• 10-topic sLDA model on movie reviews (Pang and Lee, 2005).• Response: Number of stars associated with each review
• Each component of coefficient vector η is associated with a topic.
Predictive R2
(SLDA is red.)
●
●
● ●● ● ● ●
●●
●
●
●● ●
●● ● ●
●
5 10 15 20 25 30 35 40 45 50
0.0
0.1
0.2
0.3
0.4
0.5
Number of topics
Pre
dict
ive
R2
Held out likelihood
(SLDA is red.)
●● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
5 10 15 20 25 30 35 40 45 50
−6.
42−
6.41
−6.
40−
6.39
−6.
38−
6.37
Number of topics
Per
−w
ord
held
out
log
likel
ihoo
d
Diverse response types with GLMs
• Want to work with response variables that don’t live in the reals.• binary / multiclass classification• count data• waiting time
• Model the response response with a generalized linear model
p(y | ζ, δ) = h(y , δ) exp{ζy − A(ζ)
δ
},
where ζ = η>z̄.
• Complicates inference, but allows for flexible modeling.
Example: Multi-class classification
756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809
810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863
CVPR#318
CVPR#318
CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
highway
car, sign, road
coast (highway)
car, sand beach, tree
inside city
buildings, car, sidewalk
street (inside city)
window, tree, building
occluded
tall building
trees, buildings
occluded, window
inside city (tall building)
tree, car, sidewalk
street
tree, car, sidewalk
highway (street)
car, window, tree
forest
tree trunk, trees,
ground grass
mountain (forest)
snowy mountain, tree
trunk
coast
sand beach, cloud
open country (coast)
sea water, buildings
mountain
snowy mountain,
sea water, field
highway (mountain)
tree, snowy mountain
open country
cars, field,
sand beach
coast (open country)
tree, field, sea water
Incorrect classification (correct class)
with predicted annotations
Correct classification
with predicted annotations
Figure 4. Example results from the LabelMe dataset. For each class, left side contains examples with correct classification and predictedannotations, while right side contains wrong ones (the class label in the bracket is the right one) with the predicted annotations. The italicwords indicate the class label, while the normal words are associated predicted annotations.
[28] J. Vogel and B. Schiele. A semantic typicality measure fornatural scene categorization. In DAGM-Symposium, 2004. 5
[29] Y. Wang and S. Gong. Conditional random field for naturalscene categorization. In BMVC, 2007. 5
[30] Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-labellearning with application to scene classification. In NIPS,2006. 5
8
756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809
810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863
CVPR#318
CVPR#318
CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
highway
car, sign, road
coast (highway)
car, sand beach, tree
inside city
buildings, car, sidewalk
street (inside city)
window, tree, building
occluded
tall building
trees, buildings
occluded, window
inside city (tall building)
tree, car, sidewalk
street
tree, car, sidewalk
highway (street)
car, window, tree
forest
tree trunk, trees,
ground grass
mountain (forest)
snowy mountain, tree
trunk
coast
sand beach, cloud
open country (coast)
sea water, buildings
mountain
snowy mountain,
sea water, field
highway (mountain)
tree, snowy mountain
open country
cars, field,
sand beach
coast (open country)
tree, field, sea water
Incorrect classification (correct class)
with predicted annotations
Correct classification
with predicted annotations
Figure 4. Example results from the LabelMe dataset. For each class, left side contains examples with correct classification and predictedannotations, while right side contains wrong ones (the class label in the bracket is the right one) with the predicted annotations. The italicwords indicate the class label, while the normal words are associated predicted annotations.
[28] J. Vogel and B. Schiele. A semantic typicality measure fornatural scene categorization. In DAGM-Symposium, 2004. 5
[29] Y. Wang and S. Gong. Conditional random field for naturalscene categorization. In BMVC, 2007. 5
[30] Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-labellearning with application to scene classification. In NIPS,2006. 5
8
756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809
810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863
CVPR#318
CVPR#318
CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
highway
car, sign, road
coast (highway)
car, sand beach, tree
inside city
buildings, car, sidewalk
street (inside city)
window, tree, building
occluded
tall building
trees, buildings
occluded, window
inside city (tall building)
tree, car, sidewalk
street
tree, car, sidewalk
highway (street)
car, window, tree
forest
tree trunk, trees,
ground grass
mountain (forest)
snowy mountain, tree
trunk
coast
sand beach, cloud
open country (coast)
sea water, buildings
mountain
snowy mountain,
sea water, field
highway (mountain)
tree, snowy mountain
open country
cars, field,
sand beach
coast (open country)
tree, field, sea water
Incorrect classification (correct class)
with predicted annotations
Correct classification
with predicted annotations
Figure 4. Example results from the LabelMe dataset. For each class, left side contains examples with correct classification and predictedannotations, while right side contains wrong ones (the class label in the bracket is the right one) with the predicted annotations. The italicwords indicate the class label, while the normal words are associated predicted annotations.
[28] J. Vogel and B. Schiele. A semantic typicality measure fornatural scene categorization. In DAGM-Symposium, 2004. 5
[29] Y. Wang and S. Gong. Conditional random field for naturalscene categorization. In BMVC, 2007. 5
[30] Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-labellearning with application to scene classification. In NIPS,2006. 5
8
756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809
810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863
CVPR#318
CVPR#318
CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
highway
car, sign, road
coast (highway)
car, sand beach, tree
inside city
buildings, car, sidewalk
street (inside city)
window, tree, building
occluded
tall building
trees, buildings
occluded, window
inside city (tall building)
tree, car, sidewalk
street
tree, car, sidewalk
highway (street)
car, window, tree
forest
tree trunk, trees,
ground grass
mountain (forest)
snowy mountain, tree
trunk
coast
sand beach, cloud
open country (coast)
sea water, buildings
mountain
snowy mountain,
sea water, field
highway (mountain)
tree, snowy mountain
open country
cars, field,
sand beach
coast (open country)
tree, field, sea water
Incorrect classification (correct class)
with predicted annotations
Correct classification
with predicted annotations
Figure 4. Example results from the LabelMe dataset. For each class, left side contains examples with correct classification and predictedannotations, while right side contains wrong ones (the class label in the bracket is the right one) with the predicted annotations. The italicwords indicate the class label, while the normal words are associated predicted annotations.
[28] J. Vogel and B. Schiele. A semantic typicality measure fornatural scene categorization. In DAGM-Symposium, 2004. 5
[29] Y. Wang and S. Gong. Conditional random field for naturalscene categorization. In BMVC, 2007. 5
[30] Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-labellearning with application to scene classification. In NIPS,2006. 5
8
540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593
594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647
CVPR#318
CVPR#318
CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
20 40 60 80 100 120
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
topics
ave
rag
e a
ccu
racy
image classification on the LabelMe dataset
multi!class sLDA with annotations multi!class sLDA Fei!Fei and Perona, 2005 Bosch et al., 2006
20 40 60 80 100 120
0.56
0.58
0.6
0.62
0.64
0.66
topics
ave
rag
e a
ccu
racy
image classification on the UIUC!Sport dataset
Figure 2. Comparisons of average accuracy over all classes based on 5 random train/test subsets. multi-class sLDA with annotations andmulti-class sLDA (red curves in color) are both our models. left. Accuracy as a function of the number of topics on the LabelMe dataset.right. Accuracy as a function of the number of topics on the UIUC-Sport dataset.
3. multi-class sLDA: This is the multi-class sLDA model,described in this paper.
4. multi-class sLDA with annotations: This is multi-classsLDA with annotations, described in this paper.
Note all testing is performed on unlabeled and unannotatedimages.
The results are illustrated in the graphs of Figure 2 andin the confusion matrices of Figure 3.2 Our models—multi-class sLDA and multi-class sLDA with annotations— per-form better than the other approaches. They reduce the errorof Fei-Fei and Perona, 2005 by at least 10% on both datasets, and even more for Bosch et al., 2006. This demon-strates that multi-class sLDA is a better classifier, and thatjoint modeling does not negatively affect classification ac-curacy when annotation information is available. In fact, itusually increases the accuracy.
Observe that the model of [5], unsupervised LDA com-bined with KNN, gives the worst performance of thesemethods. This highlights the difference between findingtopics that are predictive, as our models do, and findingtopics in an unsupervised way. The accuracy of unsuper-vised LDA might be increased by using some of the othervisual features suggested by [5]. Here, we restrict ourselvesto SIFT features in order to compare models, rather thanfeature sets.
As the number of topics increases, the multi-class sLDAmodels (with and without annotation) do not overfit untilaround 100 topics, while Fei-Fei and Perona, 2005 beginsto overfit at 40 topics. This suggests that multi-class sLDA,which combines aspects of both generative and discrimina-tive classification, can handle more latent features than a
2Other than the topic models listed, we also tested an SVM-based ap-proach using SIFT image features. The SVM yielded much worse perfor-mance than the topic models (47% for the LabelMe data, and 20% for theUIUC-Sport data). These are not marked on the plots.
purely generative approach. On one hand, a large numberof topics increases the possibility of overfitting; on the otherhand, it provides more latent features for building the clas-sifier.
Image Annotation. In the case of multi-class sLDA withannotations, we can use the same trained model for imageannotation. We emphasize that our models are designed forsimultaneous classification and annotation. For image an-notation, we compare following two methods,
1. Blei and Jordan, 2003: This is the corr-LDA modelfrom [2], trained on annotated images.
2. multi-class sLDA with annotations: This is exactly thesame model trained for image classification in the pre-vious section. In testing annotation, we observe onlyimages.
To measure image annotation performance, we use anevaluation measure from information retrieval. Specifi-cally, we examine the top-N F-measure3, denoted as F-measure@N , where we set N = 5. We find that multi-class sLDA with annotations performs slightly better thancorr-LDA over all the numbers of topics tested (about 1%relative improvement). For example, considering modelswith 100 topics, the LabelMe F-measures are 38.2% (corr-LDA) and 38.7% (multi-class sLDA with annotations); onUIUC-Sport, they are 34.7% (corr-LDA) and 35.0% (multi-class sLDA with annotations).
These results demonstrate that our models can performclassification and annotation with the same latent space.With a single trained model, we find the annotation per-formance that is competitive with the state-of-the-art, andclassification performance that is superior.
3F-measure is defined as 2 ! precision ! recall/(precision + recall).
6
# of components
(SLDA for image classification, with Chong Wang)
Supervised topic models
• SLDA enables model-based regression where the predictor“variable” is a text document.
• It can easily be used wherever LDA is used in an unsupervisedfashion (e.g., images, genes, music).
• SLDA is a supervised dimension-reduction technique, whereasLDA performs unsupervised dimension reduction.
• LDA + regression compared to sLDA is like principal componentsregression compared to partial least squares.
• Paper: Blei and McAuliffe, NIPS 2007.
Relational topic models
52
478
430
2487
75
288
1123
2122
2299
1354
1854
1855
89
635
92
2438
136
479
109
640
119686
120
1959
1539
147
172
177
965
911
2192
1489
885
178378
286
208
1569
2343
1270
218
1290
223
227
236
1617
254
1176
256
634
264
1963
2195
1377
303
426
2091
313
1642
534
801
335
344
585
1244
2291
2617
1627
2290
1275
375
1027
396
1678
2447
2583
1061 692
1207
960
1238
20121644
2042
381
418
1792
1284
651
524
1165
2197
1568
2593
1698
547 683
2137 1637
2557
2033632
1020
436442
449
474
649
2636
2300
539
541
603
1047
722
660
806
1121
1138
831837
1335
902
964
966
981
16731140
14811432
1253
1590
1060
992
994
1001
1010
1651
1578
1039
1040
1344
1345
1348
1355
14201089
1483
1188
1674
1680
2272
1285
1592
1234
1304
1317
1426
1695
1465
1743
1944
2259
2213
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts...
Irrelevant features and the subset selection problem
In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible...
Learning with many irrelevant features
In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating...
Evaluation and selection of biases in machine learning
The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method...
Utilizing prior concepts for learning
The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile...
Improving tactical plans with genetic algorithms
Evolutionary learning methods have been found to be useful in several areas in the development of intelligent robots. In the approach described here, evolutionary...
An evolutionary approach to learning in robots
Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior...
Using a genetic algorithm to learn strategies for collision
avoidance and local navigation
...
...
...
...
...
...
...
...
...
...
• Many data sets contain connected observations.
• For example:• Citation networks of documents• Hyperlinked networks of web-pages.• Friend-connected social network profiles
Relational topic models
52
478
430
2487
75
288
1123
2122
2299
1354
1854
1855
89
635
92
2438
136
479
109
640
119686
120
1959
1539
147
172
177
965
911
2192
1489
885
178378
286
208
1569
2343
1270
218
1290
223
227
236
1617
254
1176
256
634
264
1963
2195
1377
303
426
2091
313
1642
534
801
335
344
585
1244
2291
2617
1627
2290
1275
375
1027
396
1678
2447
2583
1061 692
1207
960
1238
20121644
2042
381
418
1792
1284
651
524
1165
2197
1568
2593
1698
547 683
2137 1637
2557
2033632
1020
436442
449
474
649
2636
2300
539
541
603
1047
722
660
806
1121
1138
831837
1335
902
964
966
981
16731140
14811432
1253
1590
1060
992
994
1001
1010
1651
1578
1039
1040
1344
1345
1348
1355
14201089
1483
1188
1674
1680
2272
1285
1592
1234
1304
1317
1426
1695
1465
1743
1944
2259
2213
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts...
Irrelevant features and the subset selection problem
In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible...
Learning with many irrelevant features
In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating...
Evaluation and selection of biases in machine learning
The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method...
Utilizing prior concepts for learning
The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile...
Improving tactical plans with genetic algorithms
Evolutionary learning methods have been found to be useful in several areas in the development of intelligent robots. In the approach described here, evolutionary...
An evolutionary approach to learning in robots
Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior...
Using a genetic algorithm to learn strategies for collision
avoidance and local navigation
...
...
...
...
...
...
...
...
...
...
• Research has focused on finding communities and patterns inthe link-structure of these networks (Kemp et al. 2004, Hoff etal., 2002, Hofman and Wiggins 2007, Airoldi et al. 2008).
• By adapting supervised topic modeling, we can build a goodmodel of content and structure.
• RTMs find related hidden structure in both types of data.
Relational topic models
α
Nd
θd
wd,n
zd,n
Kβk
yd,d'
η
Nd'
θd'
wd',n
zd',n
• Binary response variable with each pair of documents
• Adapt variational EM algorithm for sLDA with binary GLMresponse model (with different link probability functions).
• Allows predictions that are out of reach for traditional models.
Predictive performance of one type given the other
5 10 15 20 25
!1
4.0
!1
3.8
!1
3.6
!1
3.4
Cora
Lin
k L
og
Lik
eli
ho
od
!
!
!
!
!
5 10 15 20 25
!3
60
0!
35
50
!3
50
0!
34
50
Number of topics
Wo
rd L
og
Lik
eli
ho
od
!
!
! !!
5 10 15 20 25
!1
1.8
!1
1.7
!1
1.6
!1
1.5
!1
1.4
WebKB
!
!
! !
!
5 10 15 20 25
!1
14
5!
11
40
!1
13
5!
11
30
Number of topics
!
!
!! !
!
RTM,, !!""
RTM,, !!e
LDA + Regression
Mixed!Membership
Unigram/Bernoulli
5 10 15 20 25
!1
3.6
!1
3.5
!1
3.4
!1
3.3
!1
3.2
PNAS
!
!
!
!
!
5 10 15 20 25
!2
97
0!
29
60
!2
95
0!
29
40
Number of topics
!
!
!
!
!
5 10 15 20 25
!1
4.0
!1
3.8
!1
3.6
!1
3.4
CoraL
ink
Lo
g L
ike
lih
oo
d
!
!
!
!
!
5 10 15 20 25
!3
60
0!
35
50
!3
50
0!
34
50
Number of topics
Wo
rd L
og
Lik
eli
ho
od
!
!
! !!
5 10 15 20 25
!1
1.8
!1
1.7
!1
1.6
!1
1.5
!1
1.4
WebKB
!
!
! !
!
5 10 15 20 25
!1
14
5!
11
40
!1
13
5!
11
30
Number of topics
!
!
!! !
!
RTM,, !!""
RTM,, !!e
LDA + Regression
Mixed!Membership
Unigram/Bernoulli
5 10 15 20 25
!1
3.6
!1
3.5
!1
3.4
!1
3.3
!1
3.2
PNAS
!
!
!
!
!
5 10 15 20 25
!2
97
0!
29
60
!2
95
0!
29
40
Number of topics
!
!
!
!
!
Link
log
likel
ihoo
d
Wor
d lo
g lik
elih
ood
# Topics # Topics
Cora corpus (McCallum et al., 2000)Note: Traditional models of networks cannot perform these tasks.
Predictive performance of one type given the other
5 10 15 20 25
!14.0
!13.8
!13.6
!13.4
Cora
Lin
k L
og
Lik
elih
oo
d
!
!
!
!
!
5 10 15 20 25
!3600
!3550
!3500
!3450
Number of topics
Wo
rd L
og
Lik
elih
oo
d
!
!
! !!
5 10 15 20 25
!11.8
!11.7
!11.6
!11.5
!11.4
WebKB
!
!
! !
!
5 10 15 20 25
!1145
!1140
!1135
!1130
Number of topics
!
!
!! !
!
RTM,, !!""
RTM,, !!e
LDA + Regression
Mixed!Membership
Unigram/Bernoulli
5 10 15 20 25
!13.6
!13.5
!13.4
!13.3
!13.2
PNAS
!
!
!
!
!
5 10 15 20 25
!2970
!2960
!2950
!2940
Number of topics
!
!
!
!
!
5 10 15 20 25
!14.0
!13.8
!13.6
!13.4
Cora
Lin
k L
og
Lik
elih
oo
d
!
!
!
!
!
5 10 15 20 25
!3600
!3550
!3500
!3450
Number of topics
Wo
rd L
og
Lik
elih
oo
d
!
!
! !!
5 10 15 20 25
!11.8
!11.7
!11.6
!11.5
!11.4
WebKB
!
!
! !
!
5 10 15 20 25
!1145
!1140
!1135
!1130
Number of topics
!
!
!! !
!
RTM,, !!""
RTM,, !!e
LDA + Regression
Mixed!Membership
Unigram/Bernoulli
5 10 15 20 25
!13.6
!13.5
!13.4
!13.3
!13.2
PNAS
!
!
!
!
!
5 10 15 20 25
!2970
!2960
!2950
!2940
Number of topics
!
!
!
!
!
Link
log
likel
ihoo
d
Wor
d lo
g lik
elih
ood
# Topics # Topics
WebKB corpus (Craven et al., 1998)
Predictive performance of one type given the other
5 10 15 20 25
!1
4.0
!1
3.8
!1
3.6
!1
3.4
Cora
Lin
k L
og
Lik
eli
ho
od
!
!
!
!
!
5 10 15 20 25
!3
60
0!
35
50
!3
50
0!
34
50
Number of topics
Wo
rd L
og
Lik
eli
ho
od
!
!
! !!
5 10 15 20 25
!1
1.8
!1
1.7
!1
1.6
!1
1.5
!1
1.4
WebKB
!
!
! !
!
5 10 15 20 25
!1
14
5!
11
40
!1
13
5!
11
30
Number of topics
!
!
!! !
!
RTM,, !!""
RTM,, !!e
LDA + Regression
Mixed!Membership
Unigram/Bernoulli
5 10 15 20 25
!1
3.6
!1
3.5
!1
3.4
!1
3.3
!1
3.2
PNAS
!
!
!
!
!
5 10 15 20 25
!2
97
0!
29
60
!2
95
0!
29
40
Number of topics
!
!
!
!
!
5 10 15 20 25
!1
4.0
!1
3.8
!1
3.6
!1
3.4
Cora
Lin
k L
og
Lik
eli
ho
od
!
!
!
!
!
5 10 15 20 25
!3
60
0!
35
50
!3
50
0!
34
50
Number of topics
Wo
rd L
og
Lik
eli
ho
od
!
!
! !!
5 10 15 20 25
!1
1.8
!1
1.7
!1
1.6
!1
1.5
!1
1.4
WebKB
!
!
! !
!
5 10 15 20 25
!1
14
5!
11
40
!1
13
5!
11
30
Number of topics
!
!
!! !
!
RTM,, !!""
RTM,, !!e
LDA + Regression
Mixed!Membership
Unigram/Bernoulli
5 10 15 20 25
!1
3.6
!1
3.5
!1
3.4
!1
3.3
!1
3.2
PNAS
!
!
!
!
!
5 10 15 20 25
!2
97
0!
29
60
!2
95
0!
29
40
Number of topics
!
!
!
!
!
Link
log
likel
ihoo
d
Wor
d lo
g lik
elih
ood
# Topics # Topics
PNAS corpus (courtesy of JSTOR)
Predicting links from documents
16 J. CHANG AND D. BLEI
Table 2Top eight link predictions made by RTM (!e) and LDA + Regression for two documents(italicized) from Cora. The models were fit with 10 topics. Boldfaced titles indicate actual
documents cited by or citing each document. Over the whole corpus, RTM improvesprecision over LDA + Regression by 80% when evaluated on the first 20 documents
retrieved.
Markov chain Monte Carlo convergence diagnostics: A comparative review
Minorization conditions and convergence rates for Markov chain Monte Carlo
RT
M(!
e)
Rates of convergence of the Hastings and Metropolis algorithmsPossible biases induced by MCMC convergence diagnostics
Bounding convergence time of the Gibbs sampler in Bayesian image restorationSelf regenerative Markov chain Monte Carlo
Auxiliary variable methods for Markov chain Monte Carlo with applicationsRate of Convergence of the Gibbs Sampler by Gaussian Approximation
Diagnosing convergence of Markov chain Monte Carlo algorithms
Exact Bound for the Convergence of Metropolis Chains LD
A+
Regressio
n
Self regenerative Markov chain Monte CarloMinorization conditions and convergence rates for Markov chain Monte Carlo
Gibbs-markov modelsAuxiliary variable methods for Markov chain Monte Carlo with applications
Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical ModelsMediating instrumental variables
A qualitative framework for probabilistic inferenceAdaptation for Self Regenerative MCMC
Competitive environments evolve better solutions for complex tasks
Coevolving High Level Representations
RT
M(!
e)
A Survey of Evolutionary StrategiesGenetic Algorithms in Search, Optimization and Machine Learning
Strongly typed genetic programming in evolving cooperation strategiesSolving combinatorial problems using evolutionary algorithms
A promising genetic algorithm approach to job-shop scheduling. . .Evolutionary Module Acquisition
An Empirical Investigation of Multi-Parent Recombination Operators. . .
A New Algorithm for DNA Sequence Assembly
LD
A+
Regressio
n
Identification of protein coding regions in genomic DNASolving combinatorial problems using evolutionary algorithms
A promising genetic algorithm approach to job-shop scheduling. . .A genetic algorithm for passive management
The Performance of a Genetic Algorithm on a Chaotic Objective FunctionAdaptive global optimization with local search
Mutation rates as adaptations
Table 2 illustrates suggested citations using RTM (!e) and LDA + Regres-sion as predictive models. These suggestions were computed from a model fiton one of the folds of the Cora data. The top results illustrate suggested linksfor “Markov chain Monte Carlo convergence diagnostics: A comparative re-
Given a new document, which documents is it likely to link to?
Predicting links from documents
16 J. CHANG AND D. BLEI
Table 2Top eight link predictions made by RTM (!e) and LDA + Regression for two documents(italicized) from Cora. The models were fit with 10 topics. Boldfaced titles indicate actual
documents cited by or citing each document. Over the whole corpus, RTM improvesprecision over LDA + Regression by 80% when evaluated on the first 20 documents
retrieved.
Markov chain Monte Carlo convergence diagnostics: A comparative review
Minorization conditions and convergence rates for Markov chain Monte Carlo
RT
M(!
e)
Rates of convergence of the Hastings and Metropolis algorithmsPossible biases induced by MCMC convergence diagnostics
Bounding convergence time of the Gibbs sampler in Bayesian image restorationSelf regenerative Markov chain Monte Carlo
Auxiliary variable methods for Markov chain Monte Carlo with applicationsRate of Convergence of the Gibbs Sampler by Gaussian Approximation
Diagnosing convergence of Markov chain Monte Carlo algorithms
Exact Bound for the Convergence of Metropolis Chains LD
A+
Regressio
n
Self regenerative Markov chain Monte CarloMinorization conditions and convergence rates for Markov chain Monte Carlo
Gibbs-markov modelsAuxiliary variable methods for Markov chain Monte Carlo with applications
Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical ModelsMediating instrumental variables
A qualitative framework for probabilistic inferenceAdaptation for Self Regenerative MCMC
Competitive environments evolve better solutions for complex tasks
Coevolving High Level Representations
RT
M(!
e)
A Survey of Evolutionary StrategiesGenetic Algorithms in Search, Optimization and Machine Learning
Strongly typed genetic programming in evolving cooperation strategiesSolving combinatorial problems using evolutionary algorithms
A promising genetic algorithm approach to job-shop scheduling. . .Evolutionary Module Acquisition
An Empirical Investigation of Multi-Parent Recombination Operators. . .
A New Algorithm for DNA Sequence Assembly
LD
A+
Regressio
n
Identification of protein coding regions in genomic DNASolving combinatorial problems using evolutionary algorithms
A promising genetic algorithm approach to job-shop scheduling. . .A genetic algorithm for passive management
The Performance of a Genetic Algorithm on a Chaotic Objective FunctionAdaptive global optimization with local search
Mutation rates as adaptations
Table 2 illustrates suggested citations using RTM (!e) and LDA + Regres-sion as predictive models. These suggestions were computed from a model fiton one of the folds of the Cora data. The top results illustrate suggested linksfor “Markov chain Monte Carlo convergence diagnostics: A comparative re-
Given a new document, which documents is it likely to link to?
Spatially consistent topics18 J. CHANG AND D. BLEI
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.
18 J. CHANG AND D. BLEI
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.
18 J. CHANG AND D. BLEI
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.
18 J. CHANG AND D. BLEI
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.
18 J. CHANG AND D. BLEI
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Fig 5. A comparison between RTM (left) and LDA (right) of topic distributions on localnews data. Each color/row depicts a single topic. Each state’s color intensity indicates themagnitude of that topic’s component. The corresponding words associated with each topicare given in Table 3. Whereas LDA finds geographically di!use topics, RTM, by modelingspatial connectivity, finds coherent regions.
• Links are the adjacency matrix of states
• Documents are geographically-tagged news articles from Yahoo!Not exchangeable.
• RTM finds spatially consistent topics.
Summary
• Relational topic modeling allows us to analyze connecteddocuments, or other data for which the mixed-membershipassumptions are appropriate.
• Traditional models cannot predict with new and unlinked data.
• RTMs allow for such predictions• links given the new words of a document• words given the links of a new document
• Paper: Chang and Blei, AISTATS 2009.
JSTOR Discipline Analysis
• Another kind of “supervised” topic model is where the topicshave external meaning (Rammage and Manning, 2009)
• JSTOR attaches each journal to a discipline
• E.g., JASA is in Statistics; PNAS is in General Science
• We can how interdisciplinary the articles are• Compute a distribution over terms for each discipline• Find topic proportions for each article
• (Work done by Sean Gerrish)
Frances Stokes Berry;William D. Berry.
American Journal of Political Science
(1992), pp. 715-742
Journal Disciplines:
Political Science
Blake LeBaron. Philosophical
Transactions: Physical Sciences and
Engineering (1994), pp. 397-404
Journal Disciplines:
Mathematics
Biological Sciences
General Science
Gerald Marwell;Pamela Oliver. Social
Psychology Quarterly (1994), pp. 373
Journal Disciplines:
Psychology
Sociology
Kun Y. Park. Journal of Peace Research
(1993), pp. 79-93
Journal Disciplines:
Political Science
Herbert Hovenkamp. California Law
Review (1990), pp. 815-852
Journal Disciplines:
Law
Virginia Gray;David Lowery. Political
Research Quarterly (1993), pp. 81-97
Journal Disciplines:
Political Science
Use the sliders to adjust the
discipline weights. You can select
inactive disciplines as well.
Finance
Business
Statistics
Economics
Philosophy
Psychology
History of Science & Technology
African American Studies
African Studies
American Indian Studies
Anthropology
Aquatic Sciences
Archaeology
Architecture & Architectural History
Art & Art History
Asian Studies
Biological Sciences
empirical Search
The current index is only a small sample of JSTOR's collections.
Index size: 18032 documents
Showing 25 of 999 results.
Tax Innovation in the States: Capitalizing on Political Opportunity
Chaos and Nonlinear Forecastability in Economics and Finance
Reply: Theory Is Not a Social Dilemma
'Pouring New Wine into Fresh Wineskins': Defense Spending and Economic Growth in
LDCs with Application to South Korea
Positivism in Law & Economics
The Diversity of State Interest Group Systems
Discipline Browser http://dbrowser.jstor.org/browser.cgi?q=empirical+bayes&btnG...
2 of 6 10/4/09 10:37 AM
http://dbrowser.jstor.org
General Science over Time
Year
log.
prop
0
1
2
3
4
5
6
7
PNAS
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1920 1940 1960 1980 2000
Science
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
1880 1900 1920 1940 1960 1980 2000
variable● Biological.Sciences● Botany...Plant.Sciences● Developmental...Cell.Biology● Ecology...Evolutionary.Biology● General.Science● Geography● History.of.Science...Technology● Mathematics
Navel Gazing
Year
log.
prop
0
2
4
6
8
Journal of the American Statistical Association
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
1940 1960 1980 2000
The Annals of Statistics
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
1975 1980 1985 1990 1995 2000 2005
variable● Business● Economics● Finance● Mathematics● Statistics
“We should seek out unfamiliar summaries of observational material,and establish their useful properties... And still more novelty cancome from finding, and evading, still deeper lying constraints.”
(John Tukey, The Future of Data Analysis, 1962)