generative topic models for community analysis
DESCRIPTION
Generative Topic Models for Community Analysis. Pilfered from: Ramesh Nallapati http://www.cs.cmu.edu/~wcohen/10-802/lda-sep-18.ppt. Objectives. Cultural literacy for ML: Q: What are “topic models”? A 1 : popular indoor sport for machine learning researchers - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/1.jpg)
Generative Topic Models for Community Analysis
Pilfered from: Ramesh Nallapatihttp://www.cs.cmu.edu/~wcohen/10-802/lda-sep-18.ppt
![Page 2: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/2.jpg)
2 / 57
Objectives
• Cultural literacy for ML: – Q: What are “topic models”?
– A1: popular indoor sport for machine learning researchers
– A2: a particular way of applying unsupervised learning of Bayes nets to text
• Quick historical survey of some sample papers in the area
![Page 3: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/3.jpg)
3 / 57
Outline• Part I: Introduction to Topic Models
– Naive Bayes model– Mixture Models
• Expectation Maximization
– PLSA– LDA
• Variational EM• Gibbs Sampling
• Part II: Topic Models for Community Analysis– Citation modeling with PLSA– Citation Modeling with LDA– Author Topic Model– Author Topic Recipient Model– Modeling influence of Citations– Mixed membership Stochastic Block Model
![Page 4: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/4.jpg)
4 / 57
Introduction to Topic Models
• Multinomial Naïve Bayes
C
W1 W2 W3 ….. WN
M
• For each document d = 1,, M
• Generate Cd ~ Mult( ¢ | )
• For each position n = 1,, Nd
• Generate wn ~ Mult(¢|,Cd)
![Page 5: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/5.jpg)
5 / 57
Introduction to Topic Models• Naïve Bayes Model: Compact representation
C
W1 W2 W3 ….. WN
C
W
N
M
M
![Page 6: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/6.jpg)
6 / 57
Introduction to Topic Models
• Mixture model: unsupervised naïve Bayes model
C
W
NM
• Joint probability of words and classes:
• But classes are not visible:Z
![Page 7: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/7.jpg)
7 / 57
Introduction to Topic Models
![Page 8: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/8.jpg)
8 / 57
Introduction to Topic Models
• Probabilistic Latent Semantic Analysis Model
d
z
w
M
• Select document d ~ Mult()
• For each position n = 1,, Nd
• generate zn ~ Mult( ¢ | d)
• generate wn ~ Mult( ¢ | zn)
d
N
Topic distribution
![Page 9: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/9.jpg)
9 / 57
Introduction to Topic Models
• Probabilistic Latent Semantic Analysis Model– Learning using EM– Not a complete generative model
• Has a distribution over the training set of documents: no new document can be generated!
– Nevertheless, more realistic than mixture model
• Documents can discuss multiple topics!
![Page 10: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/10.jpg)
10 / 57
Introduction to Topic Models
• PLSA topics (TDT-1 corpus)
![Page 11: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/11.jpg)
11 / 57
Introduction to Topic Models
![Page 12: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/12.jpg)
12 / 57
Introduction to Topic Models
• Latent Dirichlet Allocation
z
w
M
N
• For each document d = 1,,M
• Generate d ~ Dir(¢ | )
• For each position n = 1,, Nd
• generate zn ~ Mult( ¢ | d)
• generate wn ~ Mult( ¢ | zn)
![Page 13: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/13.jpg)
13 / 57
Introduction to Topic Models
• Latent Dirichlet Allocation– Overcomes the issues with PLSA
• Can generate any random document
– Parameter learning:• Variational EM
– Numerical approximation using lower-bounds
– Results in biased solutions
– Convergence has numerical guarantees
• Gibbs Sampling – Stochastic simulation
– unbiased solutions
– Stochastic convergence
![Page 14: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/14.jpg)
14 / 57
Introduction to Topic Models
• Variational EM for LDA– Approximate the posterior by a simpler
distribution
• A convex function in each parameter!
![Page 15: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/15.jpg)
15 / 57
Introduction to Topic Models
• Gibbs sampling– Applicable when joint distribution is hard to evaluate but
conditional distribution is known– Sequence of samples comprises a Markov Chain– Stationary distribution of the chain is the joint distribution
![Page 16: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/16.jpg)
16 / 57
Introduction to Topic Models
• LDA topics
![Page 17: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/17.jpg)
17 / 57
Introduction to Topic Models
• LDA’s view of a document
![Page 18: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/18.jpg)
18 / 57
Introduction to Topic Models
• Perplexity comparison of various models
Unigram
Mixture model
PLSA
LDALower is better
![Page 19: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/19.jpg)
19 / 57
Outline• Part I: Introduction to Topic Models
– Naive Bayes model– Mixture Models
• Expectation Maximization
– PLSA– LDA
• Variational EM• Gibbs Sampling
• Part II: Topic Models for Community Analysis– Citation modeling with PLSA– Citation Modeling with LDA– Author Topic Model– Author Topic Recipient Model– Modeling influence of Citations– Mixed membership Stochastic Block Model
![Page 20: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/20.jpg)
20 / 57
Hyperlink modeling using PLSA
![Page 21: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/21.jpg)
21 / 57
Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]
d
z
w
M
d
N
z
c
• Select document d ~ Mult()
• For each position n = 1,, Nd
• generate zn ~ Mult( ¢ | d)
• generate wn ~ Mult( ¢ | zn)
• For each citation j = 1,, Ld
• generate zj ~ Mult( ¢ | d)
• generate cj ~ Mult( ¢ | zj)L
![Page 22: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/22.jpg)
22 / 57
Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]
d
z
w
M
d
N
z
c
L
PLSA likelihood:
New likelihood:
Learning using EM
![Page 23: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/23.jpg)
23 / 57
Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]
Heuristic:
0 · · 1 determines the relative importance of content and hyperlinks
(1-)
![Page 24: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/24.jpg)
24 / 57
Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]
• Classification performance
Hyperlink content Hyperlink content
![Page 25: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/25.jpg)
25 / 57
Hyperlink modeling using LDA
![Page 26: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/26.jpg)
26 / 57
Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]
z
w
M
N
• For each document d = 1,,M
• Generate d ~ Dir(¢ | )
• For each position n = 1,, Nd
• generate zn ~ Mult( ¢ | d)
• generate wn ~ Mult( ¢ | zn)
•For each citation j = 1,, Ld
• generate zj ~ Mult( . | d)
• generate cj ~ Mult( . | zj)
z
c
L
Learning using variational EM
![Page 27: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/27.jpg)
27 / 57
Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]
![Page 28: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/28.jpg)
28 / 57
Author-Topic Model for Scientific Literature
![Page 29: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/29.jpg)
29 / 57
Author-Topic Model for Scientific Literature[Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]
z
w
M
N
• For each author a = 1,,A
• Generate a ~ Dir(¢ | )
• For each topic k = 1,,K
• Generate k ~ Dir( ¢ | )
•For each document d = 1,,M
• For each position n = 1,, Nd
•Generate author x ~ Unif(¢ | ad)
• generate zn ~ Mult( ¢ | a)
• generate wn ~ Mult( ¢ | zn)
x
a
A
P
K
![Page 30: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/30.jpg)
30 / 57
Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]
Learning: Gibbs sampling
z
w
M
N
x
a
A
P
K
![Page 31: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/31.jpg)
31 / 57
Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]
• Topic-Author visualization
![Page 32: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/32.jpg)
32 / 57
Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]
![Page 33: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/33.jpg)
33 / 57
Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]
Gibbs sampling
![Page 34: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/34.jpg)
34 / 57
Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]
• Datasets– Enron email data
• 23,488 messages between 147 users
– McCallum’s personal email• 23,488(?) messages with 128 authors
![Page 35: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/35.jpg)
35 / 57
Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]
• Topic Visualization: Enron set
![Page 36: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/36.jpg)
36 / 57
Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]
• Topic Visualization: McCallum’s data
![Page 37: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/37.jpg)
37 / 57
Modeling Citation Influences
![Page 38: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/38.jpg)
38 / 57
Modeling Citation Influences[Dietz, Bickel, Scheffer, ICML 2007]
• Citation influence model
![Page 39: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/39.jpg)
39 / 57
Modeling Citation Influences[Dietz, Bickel, Scheffer, ICML 2007]
• Citation influence graph for LDA paper
![Page 40: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/40.jpg)
40 / 57
Modeling Citation Influences[Dietz, Bickel, Scheffer, ICML 2007]
• Words in LDA paper assigned to citations
![Page 41: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/41.jpg)
41 / 57
Link-PLSA-LDA: Topic Influence in Blogs (ICWSM 2008)
Ramesh Nallapati,
Amr Ahmed
Eric Xing
![Page 42: Generative Topic Models for Community Analysis](https://reader036.vdocument.in/reader036/viewer/2022081511/56815a9d550346895dc823c5/html5/thumbnails/42.jpg)
42 / 57