lda on social bookmarking systems
TRANSCRIPT
![Page 1: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/1.jpg)
LDA on Social Bookmarking Systems: an experiment on CiteUlike
Introduction to Natural Language Processing, CS2731 Professor Rebecca Hwa University of Pittsburgh Denis Parra-Santander December 16th 2009
1
![Page 2: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/2.jpg)
Outline
2
Outline
Topic Modeling Joke (check mood of people…)
LDA Introduction
Sorry I’m nervous…
Motivation Smart
Statement…
Definitions
Monte Carlo: a great place pass your vacations
DIRICHLET: [diʀiˈkleː]
Uuuh…
Uuuh…
Experiments
Results
END
Evaluation method
![Page 3: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/3.jpg)
Topic modeling: Evolution LSA [Deerwester et al. 90]: find “latent”
structure or “concepts” in a text corpus.: ◦ Compare texts using a vector-based
representation that is learned from a corpus. Relies on SVD (for dimensionality reduction)
PLSA [Hoffman 99] extends LSA by adding the idea of mixture decomposition derived from a latent class model.
LDA [Blei et al. 2004] : extends PLSA by using a generative model, in particular, by adding a Dirichlet prior.
3
![Page 4: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/4.jpg)
Document 22
LDA : Generative Model* (I/II)
4
Words: Information About catalog pricing changes 2009 welcome looking hands-on science ideas try kitchen
• LDA assumes that each word in the document was generated by a distribution of topics over words.
Topic 15: science experiment learning ideas practice information
Topic 9: catalog shopping buy internet checkout cart
• Paired with an inference mechanism (Gibbs sampling), learns per-document distribution over topics, per-topic distributions over words
…
*Original slide by Daniel Ramage, Stanford University
![Page 5: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/5.jpg)
LDA I/II : Graphical Model
5
Graphical model representations
Compact notation:
Cat
w1 w2 w3 w4 wn …
*Original slide by Roger Levy, UCSD
Cat
w1 n
“generate a word from Cat n times” a “plate”
![Page 6: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/6.jpg)
LDA II/II : Graphical Model
6
Nd D
zi
wi
θ (d)
φ (j)
α
β θ (d) ∼ Dirichlet(α)
zi ∼ Discrete(θ (d) ) φ (j) ∼ Dirichlet(β)
wi ∼ Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
*Original slide by Roger Levy, UCSD
![Page 7: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/7.jpg)
Learning the parameters
7
Maximum likelihood estimation (EM) ◦ e.g. Hofmann (1999)
Deterministic approximate algorithms ◦ variational EM; Blei, Ng & Jordan (2001; 2003) ◦ expectation propagation; Minka & Lafferty
(2002) Markov chain Monte Carlo ◦ full Gibbs sampler; Pritchard et al. (2000) ◦ collapsed Gibbs sampler; Griffiths & Steyvers
(2004)
*Original slide by Roger Levy, UCSD
![Page 8: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/8.jpg)
My Experiments Identify Topics in a collection of documents
from a social bookmarking system (citeULike) [Ramage et al. 2008]
Objective: Clusterise documents by LDA QUESTION: If the documents have, in
addition to title and text, USER TAGS… how can they help/influence/improve topic identification/clustering?
8
![Page 9: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/9.jpg)
Tools available
Many implementations of LDA based on Gibbs sampling:
LingPipe (Java) Mallet (Java) STMT (Scala) – I chose this one
9
![Page 10: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/10.jpg)
The Dataset Initially ◦ Corpus: ~45k documents, ◦ Definition of 99 topics (queries) ◦ Gold-std : Identification document-topic by
expert feedback, defining a ground-truth But then, then gold-standard and RAM… ◦ Not all documents were relevant ◦ Unable to train model with 45k, 20k and10k
And then, the tags: not all the documents in gold-standard had associated tags (#>2) ◦ Finally: Training with 1.1k documents ◦ Experiments on 212 documents
10
![Page 11: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/11.jpg)
Evaluation: Pair-wise precision / recall
11
*Original slide by Daniel Ramage, Stanford University
![Page 12: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/12.jpg)
… results …
12
![Page 13: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/13.jpg)
Perplexity
13
# of topics
38 52 99
Content Tags 1860.7642 1880.7974 1270.8032
Title + text
2526.7589 2447.5477 2755.1329
Using Stanford Topic Modeling Toolbox (STMT) Training with ~1.1k documents, 80% training, 20% to calculate pp.
![Page 14: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/14.jpg)
F1 ( & precision/recall)
F-1, in parenthesis precision and recall
14
# of topics
38 52 99
Tags 0.139 (0.118/0.167)
0.168 (0.187/0.152)
0.215 (0.267/0.18)
Title + text
0.1252 (0.122/0.128)
0.157 (0.151/0.163)
0.156 (0.198/0.129)
![Page 15: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/15.jpg)
Conclusions Results are not the same than “motivational”
paper, though are consistent with their conclusions (dataset is very domain-specific)
Pending: combining tags and documents, in particular MM-LDA
Importance to NLP: extensions of the model have been used to: ◦ learn syntactic and semantic factors that guide
word choice ◦ Identify authorship ◦ Many others ()
15
![Page 16: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/16.jpg)
… and to finish … Thanks! And…
16
![Page 17: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/17.jpg)
“Invent new worlds and watch your word;
The adjective, when it doesn’t give life, kills…”
Ars Poetica Vicente Huidobro
“Inventa nuevos mundos y cuida tu palabra; El adjetivo, cuando no da vida, mata…”
17
![Page 18: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/18.jpg)
References Heinrich, G. (2008). Parameter estimation for text analysis,. Technical report,
University of Leipzig.
Ramage, D., P. Heymann, C. D. Manning, and H. G. Molina (2009). Clustering the tagged web. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, New York, NY, USA, pp. 54-63. ACM.
Steyvers, M. and T. Griffiths (2007). Probabilistic Topic Models. Lawrence Erlbaum Associates.
18
![Page 19: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/19.jpg)
Backup Slides
19
![Page 20: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/20.jpg)
LSA: 3 claims (2 match with LDA)
Semantic Information can be derived from a word-document co-ocurrence matrix
Dimensionality reduction is an essential part of this derivation
Words and documents can be represented as points in an Euclidean Space => different than LDA: semantic properties of words and docs are expressed in terms of probabilistic topics
20
![Page 21: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/21.jpg)
21
Parameter estimation and Gibbs Sampling (3 Slides)
![Page 22: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/22.jpg)
Inverting the generative model
Maximum likelihood estimation (EM) ◦ e.g. Hofmann (1999)
Deterministic approximate algorithms ◦ variational EM; Blei, Ng & Jordan (2001; 2003) ◦ expectation propagation; Minka & Lafferty (2002)
Markov chain Monte Carlo ◦ full Gibbs sampler; Pritchard et al. (2000) ◦ collapsed Gibbs sampler; Griffiths & Steyvers (2004)
![Page 23: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/23.jpg)
The collapsed Gibbs sampler
Using conjugacy of Dirichlet and multinomial distributions, integrate out continuous parameters
Defines a distribution on discrete ensembles z
ΦΦΦ= ∫∆
dpPPTW
)(),|()|( zwzw
ΘΘΘ= ∫∆
dpPPDT
)()|()( zz
∑=
zzzw
zzwwz)()|(
)()|()|(PP
PPP
∏ ∑∏
= +ΓΓ
Γ
+Γ=
T
jw
jw
Ww
jw
nWn
1)(
)(
)()(
)()(
ββ
ββ
∏ ∑∏
= +ΓΓ
Γ
+Γ=
D
dj
dj
Tj
dj
nTn
1)(
)(
)()(
)(
)(
αα
α
α
![Page 24: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/24.jpg)
The collapsed Gibbs sampler
Sample each zi conditioned on z-i
This is nicer than your average Gibbs sampler: ◦ memory: counts can be cached in two sparse matrices ◦ optimization: no special functions, simple arithmetic ◦ the distributions on Φ and Θ are analytic given z and w, and can
later be found for each sample
αα
ββ
Tnn
Wnn
zPi
i
i
i
i
d
dj
z
zw
ii +
+
+
+∝
••− )(
)(
)(
)(
),|( zw
![Page 25: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/25.jpg)
Gibbs Sampling from PTM paper
25
![Page 26: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/26.jpg)
26
Extensions and Applications
![Page 27: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/27.jpg)
Nu U
zi
wi
θ(u)
φ (j)
α
β
θ(u)|su=0 ∼ Delta(θ(u-1)) θ(u)|su=1 ∼ Dirichlet(α)
zi ∼ Discrete(θ (u) )
φ (j) ∼ Dirichlet(β)
wi ∼ Discrete(φ (zi) )
T
Extension: a model for meetings
su
θ(u-1) …
(Purver, Kording, Griffiths, & Tenenbaum, 2006)
![Page 28: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/28.jpg)
Sample of ICSI meeting corpus (25 meetings)
no it's o_k. it's it'll work. well i can do that. but then i have to end the presentation in the middle so i can go back to open up javabayes. o_k fine. here let's see if i can. alright. very nice. is that better. yeah. o_k. uh i'll also get rid of this click to add notes. o_k. perfect NEW TOPIC (not supplied to algorithm) so then the features we decided or we decided we were talked about. right. uh the the prosody the discourse verb choice. you know we had a list of things like to go and to visit and what not. the landmark-iness of uh. i knew you'd like that. nice coinage
![Page 29: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/29.jpg)
Topic segmentation applied to meetings
Inferred Segmentation
Inferred Topics
![Page 30: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/30.jpg)
Comparison with human judgments
Topics recovered are much more coherent than those found using random segmentation, no segmentation, or an HMM
![Page 31: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/31.jpg)
Learning the number of topics
Can use standard Bayes factor methods to evaluate models of different dimensionality ◦ e.g. importance sampling via MCMC
Alternative: nonparametric Bayes ◦ fixed number of topics per document,
unbounded number of topics per corpus (Blei, Griffiths, Jordan, & Tenenbaum, 2004)
◦ unbounded number of topics for both (the hierarchical Dirichlet process)
(Teh, Jordan, Beal, & Blei, 2004)
![Page 32: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/32.jpg)
The Author-Topic model (Rosen-Zvi, Griffiths, Smyth, & Steyvers, 2004)
Nd D
zi
wi
θ (a)
φ (j)
α
β
θ (a) ∼ Dirichlet(α)
zi ∼ Discrete(θ (xi) ) φ (j) ∼ Dirichlet(β)
wi ∼ Discrete(φ (zi) )
T
xi
A
xi ∼ Uniform(A (d) )
each author has a distribution over topics
the author of each word is chosen uniformly at random
![Page 33: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/33.jpg)
Four example topics from NIPS
WORD PROB. WORD PROB. WORD PROB. WORD PROB.LIKELIHOOD 0.0539 RECOGNITION 0.0400 REINFORCEMENT 0.0411 KERNEL 0.0683
MIXTURE 0.0509 CHARACTER 0.0336 POLICY 0.0371 SUPPORT 0.0377
EM 0.0470 CHARACTERS 0.0250 ACTION 0.0332 VECTOR 0.0257
DENSITY 0.0398 TANGENT 0.0241 OPTIMAL 0.0208 KERNELS 0.0217GAUSSIAN 0.0349 HANDWRITTEN 0.0169 ACTIONS 0.0208 SET 0.0205
ESTIMATION 0.0314 DIGITS 0.0159 FUNCTION 0.0178 SVM 0.0204
LOG 0.0263 IMAGE 0.0157 REWARD 0.0165 SPACE 0.0188
MAXIMUM 0.0254 DISTANCE 0.0153 SUTTON 0.0164 MACHINES 0.0168PARAMETERS 0.0209 DIGIT 0.0149 AGENT 0.0136 REGRESSION 0.0155
ESTIMATE 0.0204 HAND 0.0126 DECISION 0.0118 MARGIN 0.0151
AUTHOR PROB. AUTHOR PROB. AUTHOR PROB. AUTHOR PROB.Tresp_V 0.0333 Simard_P 0.0694 Singh_S 0.1412 Smola_A 0.1033
Singer_Y 0.0281 Martin_G 0.0394 Barto_A 0.0471 Scholkopf_B 0.0730
Jebara_T 0.0207 LeCun_Y 0.0359 Sutton_R 0.0430 Burges_C 0.0489Ghahramani_Z 0.0196 Denker_J 0.0278 Dayan_P 0.0324 Vapnik_V 0.0431
Ueda_N 0.0170 Henderson_D 0.0256 Parr_R 0.0314 Chapelle_O 0.0210
Jordan_M 0.0150 Revow_M 0.0229 Dietterich_T 0.0231 Cristianini_N 0.0185Roweis_S 0.0123 Platt_J 0.0226 Tsitsiklis_J 0.0194 Ratsch_G 0.0172
Schuster_M 0.0104 Keeler_J 0.0192 Randlov_J 0.0167 Laskov_P 0.0169
Xu_L 0.0098 Rashid_M 0.0182 Bradtke_S 0.0161 Tipping_M 0.0153
Saul_L 0.0094 Sackinger_E 0.0132 Schwartz_A 0.0142 Sollich_P 0.0141
TOPIC 19 TOPIC 24 TOPIC 29 TOPIC 87
![Page 34: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/34.jpg)
Who wrote what?
A method1 is described which like the kernel1 trick1 in support1 vector1 machines1 SVMs1 lets us generalize distance1 based2 algorithms to operate in feature1 spaces usually nonlinearly related to the input1 space This is done by identifying a class of kernels1 which can be represented as norm1 based2 distances1 in Hilbert spaces It turns1 out that common kernel1 algorithms such as SVMs1 and kernel1 PCA1 are actually really distance1 based2 algorithms and can be run2 with that class of kernels1 too As well as providing1 a useful new insight1 into how these algorithms work the present2 work can form the basis1 for conceiving new algorithms
This paper presents2 a comprehensive approach for model2 based2 diagnosis2 which includes proposals for characterizing and computing2 preferred2 diagnoses2 assuming that the system2 description2 is augmented with a system2 structure2 a directed2 graph2 explicating the interconnections between system2 components2 Specifically we first introduce the notion of a consequence2 which is a syntactically2 unconstrained propositional2 sentence2 that characterizes all consistency2 based2 diagnoses2 and show2 that standard2 characterizations of diagnoses2 such as minimal conflicts1 correspond to syntactic2 variations1 on a consequence2 Second we propose a new syntactic2 variation on the consequence2 known as negation2 normal form NNF and discuss its merits compared to standard variations Third we introduce a basic algorithm2 for computing consequences in NNF given a structured system2 description We show that if the system2 structure2 does not contain cycles2 then there is always a linear size2 consequence2 in NNF which can be computed in linear time2 For arbitrary1 system2 structures2 we show a precise connection between the complexity2 of computing2 consequences and the topology of the underlying system2 structure2 Finally we present2 an algorithm2 that enumerates2 the preferred2 diagnoses2 characterized by a consequence2 The algorithm2 is shown1 to take linear time2 in the size2 of the consequence2 if the preference criterion1 satisfies some general conditions
Written by (1) Scholkopf_B
Written by (2) Darwiche_A
![Page 35: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/35.jpg)
Analysis of PNAS abstracts
Test topic models with a real database of scientific papers from PNAS
All 28,154 abstracts from 1991-2001 All words occurring in at least five
abstracts, not on “stop” list (20,551) Total of 3,026,970 tokens in corpus
(Griffiths & Steyvers, 2004)
![Page 36: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/36.jpg)
FORCE SURFACE
MOLECULES SOLUTION SURFACES
MICROSCOPY WATER
FORCES PARTICLES STRENGTH POLYMER
IONIC ATOMIC
AQUEOUS MOLECULAR PROPERTIES
LIQUID SOLUTIONS
BEADS MECHANICAL
HIV VIRUS
INFECTED IMMUNODEFICIENCY
CD4 INFECTION
HUMAN VIRAL TAT
GP120 REPLICATION
TYPE ENVELOPE
AIDS REV
BLOOD CCR5
INDIVIDUALS ENV
PERIPHERAL
MUSCLE CARDIAC HEART
SKELETAL MYOCYTES
VENTRICULAR MUSCLES SMOOTH
HYPERTROPHY DYSTROPHIN
HEARTS CONTRACTION
FIBERS FUNCTION
TISSUE RAT
MYOCARDIAL ISOLATED
MYOD FAILURE
STRUCTURE ANGSTROM
CRYSTAL RESIDUES
STRUCTURES STRUCTURAL RESOLUTION
HELIX THREE
HELICES DETERMINED
RAY CONFORMATION
HELICAL HYDROPHOBIC
SIDE DIMENSIONAL
INTERACTIONS MOLECULE SURFACE
NEURONS BRAIN
CORTEX CORTICAL
OLFACTORY NUCLEUS
NEURONAL LAYER
RAT NUCLEI
CEREBELLUM CEREBELLAR
LATERAL CEREBRAL
LAYERS GRANULE LABELED
HIPPOCAMPUS AREAS
THALAMIC
A selection of topics
TUMOR CANCER TUMORS HUMAN CELLS
BREAST MELANOMA GROWTH
CARCINOMA PROSTATE NORMAL
CELL METASTATIC MALIGNANT
LUNG CANCERS
MICE NUDE
PRIMARY OVARIAN
![Page 37: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/37.jpg)
Cold topics Hot topics
2 SPECIES GLOBAL CLIMATE
CO2 WATER
ENVIRONMENTAL YEARS MARINE CARBON
DIVERSITY OCEAN
EXTINCTION TERRESTRIAL COMMUNITY ABUNDANCE
134 MICE
DEFICIENT NORMAL
GENE NULL
MOUSE TYPE
HOMOZYGOUS ROLE
KNOCKOUT DEVELOPMENT
GENERATED LACKING ANIMALS
REDUCED
179 APOPTOSIS
DEATH CELL
INDUCED BCL
CELLS APOPTOTIC
CASPASE FAS
SURVIVAL PROGRAMMED
MEDIATED INDUCTION CERAMIDE
EXPRESSION
37 CDNA AMINO
SEQUENCE ACID
PROTEIN ISOLATED ENCODING
CLONED ACIDS
IDENTITY CLONE
EXPRESSED ENCODES
RAT HOMOLOGY
289 KDA
PROTEIN PURIFIED
MOLECULAR MASS
CHROMATOGRAPHY POLYPEPTIDE
GEL SDS
BAND APPARENT LABELED
IDENTIFIED FRACTION DETECTED
75 ANTIBODY
ANTIBODIES MONOCLONAL
ANTIGEN IGG MAB
SPECIFIC EPITOPE HUMAN MABS
RECOGNIZED SERA
EPITOPES DIRECTED
NEUTRALIZING
![Page 38: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/38.jpg)
38
The effect of Alpha and beta as hyperparameters
![Page 39: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/39.jpg)
Effects of hyperparameters
α and β control the relative sparsity of Φ and Θ ◦ smaller α, fewer topics per document ◦ smaller β, fewer words per topic
Good assignments z compromise in sparsity
log
Γ(x)
x
∏ ∑∏
= +ΓΓ
Γ
+Γ=
T
jw
jw
Ww
jw
nWn
P1
)(
)(
)()(
)()(
)|(β
ββ
βzw
∏ ∑∏
= +ΓΓ
Γ
+Γ=
D
dj
dj
Tj
dj
nTn
P1
)(
)(
)()(
)(
)()(
αα
α
αz
![Page 40: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/40.jpg)
Varying α
decreasing α increases sparsity
![Page 41: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/41.jpg)
Varying β decreasing β
increases sparsity ?
![Page 42: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/42.jpg)
Multi-Multinomial LDA (MM-LDA)
42
![Page 43: LDA on social bookmarking systems](https://reader038.vdocument.in/reader038/viewer/2022110123/55bebd1ebb61eb9f1d8b47d5/html5/thumbnails/43.jpg)
Ramage 2009 results
43