1ort ml a 11.1.2010 figures and references to topic models, with applications to document...

10
Ort 1 ML A 11.1.2010 Figures and References to Topic Models, with Applications to Document Classification Wolfgang Maass Institut für Grundlagen der Informationsverarbeitung Technische Universität Graz, Austria Institute for Theoretical Computer http://www.igi.tugraz.at/maass/

Post on 22-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Ort 1

ML A 11.1.2010

Figures and References to Topic Models, with

Applications to Document Classification

Wolfgang Maass

Institut für Grundlagen der Informationsverarbeitung

Technische Universität Graz, Austria

Institute for Theoretical Computer Science http://www.igi.tugraz.at/maass/

Ort 2

Examples for topics (that have emerged from unsupervised learning

for a collection of 37000 documents)

M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

Ort 3

Example for a document, where a topic has been assigned to each (relevant) word

or in other words

the latent z-variables are indicated for each word

T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, 5228-5235, 2004.

Ort 4

The same word can occur in several topics(but in general receives different probabilities in each topic)

M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

Ort 5

The latent z-variables choose here the right topic for the word play in each of the three documents

M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

Ort 6

Graphical model for the joint distribution of a topic model

M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

Ort 7

A toy example

M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

Ort 8Performance of Gibbs sampling for this toy example: Documents were generated by mixing 2 topics in different

ways, where topic 1 assigned prob. 1/3 to Bank, Money , Loan, and topic 2 1/3 to River, Stream, Bank

M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

Topic assignments to words are indicated by color (b/w).

Initially topics are randomly Assigned to words.

After Gibbs sampling the 2 original topics are recovered from the documents.

Ort 9

Application to real world data: 28000 abstracts

from PNAS

T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, 5228-5235, 2004.

Topics chosen by humans are on the y-axis, topics chosen by the algorithm on the x-axis.

Darkness of pixel indicates mean prob. of the latter topic for all abstract belonging to the human-chosen category.

Below are the 5 words with the highest prob. for each of the algorithm-generated topics.

Ort 10