1ort ml a 11.1.2010 figures and references to topic models, with applications to document...
Post on 22-Dec-2015
215 views
TRANSCRIPT
Ort 1
ML A 11.1.2010
Figures and References to Topic Models, with
Applications to Document Classification
Wolfgang Maass
Institut für Grundlagen der Informationsverarbeitung
Technische Universität Graz, Austria
Institute for Theoretical Computer Science http://www.igi.tugraz.at/maass/
Ort 2
Examples for topics (that have emerged from unsupervised learning
for a collection of 37000 documents)
M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
Ort 3
Example for a document, where a topic has been assigned to each (relevant) word
or in other words
the latent z-variables are indicated for each word
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, 5228-5235, 2004.
Ort 4
The same word can occur in several topics(but in general receives different probabilities in each topic)
M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
Ort 5
The latent z-variables choose here the right topic for the word play in each of the three documents
M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
Ort 6
Graphical model for the joint distribution of a topic model
M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
Ort 7
A toy example
M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
Ort 8Performance of Gibbs sampling for this toy example: Documents were generated by mixing 2 topics in different
ways, where topic 1 assigned prob. 1/3 to Bank, Money , Loan, and topic 2 1/3 to River, Stream, Bank
M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
Topic assignments to words are indicated by color (b/w).
Initially topics are randomly Assigned to words.
After Gibbs sampling the 2 original topics are recovered from the documents.
Ort 9
Application to real world data: 28000 abstracts
from PNAS
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, 5228-5235, 2004.
Topics chosen by humans are on the y-axis, topics chosen by the algorithm on the x-axis.
Darkness of pixel indicates mean prob. of the latter topic for all abstract belonging to the human-chosen category.
Below are the 5 words with the highest prob. for each of the algorithm-generated topics.