hierarchical topic models and the nested chinese restaurant process
DESCRIPTION
Hierarchical Topic Models and the Nested Chinese Restaurant Process. Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz. Document classification. One-class approach: one topic per document, with words generated according to the topic. For example, a Naive Bayes model. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/1.jpg)
Hierarchical Topic Models and the Nested Chinese Restaurant
ProcessBlei, Griffiths, Jordan, Tenenbaum
presented by Rodrigo de Salvo Braz
![Page 2: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/2.jpg)
Document classification
• One-class approach: one topic per document, with words generated according to the topic.
• For example, a Naive Bayes model.
![Page 3: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/3.jpg)
Document classification
• It is more realistic to assume more than one topic per document.
• Generative model: pick a mixture distribution over K topics and generate words from it.
![Page 4: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/4.jpg)
Document classification
• Even more realistic: topics may be organized in a hierarchy (not independent);
• Pick a path from root to leaf in a tree; each node is a topic; sample from the mixture.
![Page 5: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/5.jpg)
Dirichlet distribution (DD)
• Distribution over distribution vectors of dimension K:P(p; u, ) = 1/Z(u) i pi
ui
• Parameters are a prior distribution (“previous observations”);
• Symmetric Dirichlet distribution assumes a uniform prior distribution (ui = uj, any i, j).
![Page 6: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/6.jpg)
Latent Dirichlet Allocation (LDA)
• Generative model of multiple-topic documents;
• Generate a mixture distribution on topics using a Dirichlet distribution;
• Pick a topic according to their distribution and generate words according to the word distribution for the topic.
![Page 7: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/7.jpg)
Latent Dirichlet Allocation (LDA)
K
W
wWords
Topics
Topic distribution
DD hyper parameter
![Page 8: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/8.jpg)
Chinese Restaurant Process (CRP)
1 out of 9 customers
![Page 9: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/9.jpg)
Chinese Restaurant Process (CRP)
2 out of 9 customers
![Page 10: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/10.jpg)
Chinese Restaurant Process (CRP)
3 out of 9 customers
![Page 11: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/11.jpg)
Chinese Restaurant Process (CRP)
4 out of 9 customers
![Page 12: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/12.jpg)
Chinese Restaurant Process (CRP)
5 out of 9 customers
![Page 13: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/13.jpg)
Chinese Restaurant Process (CRP)
6 out of 9 customers
![Page 14: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/14.jpg)
Chinese Restaurant Process (CRP)
7 out of 9 customers
![Page 15: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/15.jpg)
Chinese Restaurant Process (CRP)
8 out of 9 customers
![Page 16: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/16.jpg)
Chinese Restaurant Process (CRP)
9 out of 9 customers
Data point (a distribution itself) sampled
![Page 17: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/17.jpg)
Species Sampling Mixture
• Generative model of multiple-topic documents;
• Generate a mixture distribution on topics using a CRP prior;
• Pick a topic according to their distribution and generate words according to the word distribution for the topic.
![Page 18: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/18.jpg)
Species Sampling Mixture
K
W
wWords
Topics
Topic distribution
CRP hyper parameter
![Page 19: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/19.jpg)
Nested CRP1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
![Page 20: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/20.jpg)
Hierarchical LDA (hLDA)
• Generative model of multiple-topic documents;• Generate a mixture distribution on topics using a
Nested CRP prior;• Pick a topic according to their distribution and
generate words according to the word distribution for the topic.
![Page 21: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/21.jpg)
hLDA graphical model
![Page 22: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/22.jpg)
Artificial data experiment
100 1000-word documents on 25-term vocabulary
Each vertical bar is a topic
![Page 23: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/23.jpg)
CRP prior vs. Bayes Factors
![Page 24: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/24.jpg)
Predicting the structure
![Page 25: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/25.jpg)
NIPS abstracts
![Page 26: Hierarchical Topic Models and the Nested Chinese Restaurant Process](https://reader036.vdocument.in/reader036/viewer/2022062323/568159e3550346895dc72e91/html5/thumbnails/26.jpg)
Comments
• Accommodates growing collections of data;
• Hierarchical organization makes sense, but not clear to me why the CRP prior is the best prior for that;
• No mention of time; maybe it takes a very long time.