incorporating hierarchical diric- hlet process into tag topic model 张明 2013.4

22
Incorporating Hierarchical Diric-hlet Process into Tag topic Model 张张 2013.4

Upload: baldric-blair

Post on 05-Jan-2016

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Incorporating Hierarchical Diric-hlet Process into Tag topic Model

张明 2013.4

Page 2: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Agenda

Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion

Page 3: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Introduction

With the rapid development of web 2.0, the internet has brought a large amount of re sources, such as blogs, twitter, and encyclope dia.

These resources contain a wealth of information, which can be applied to a variety of fields in information processing to improve the service quality, but it is too deficiency to use tradition human professional to dispose the information.

Page 4: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Introduction

In NLP, computer programs face several tasks that require human-level intelligence, or the programs should be endowed with the ability of language understanding.

One core of the issues is how to automatically obtain knowledge and effectively use them to achieve semantic analysis and computation

Page 5: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Introduction

Tagging has recently emerged as a popular way to organize user generated content for Web 2.0 applications, such as blogs and bookmarks. In blogs, users can assign one or more tags for each blog. Usually, these tags can reflect the concerned subjects of the contents. Tags can be seen as labeled meta-information about the content, and they are beneficial for knowledge mining from blogs.

Page 6: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Introduction

In this paper, we extend the Tag topic model (TTM)1 by crystallized HDP as prior distribution. We assume that an author is clear in his mind that the content will contains which as pects before he writes a blog and for each aspect he will choose a tag to describe it.

Page 7: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Agenda

Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion

Page 8: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

LDA Generative model

Page 9: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Tag-Topic Model

Basic ideal: each docu ment with a mixture of tags, each tag can be viewed as a multinomial distribution over topics and each topic is associated with a multi nomial distribution over words.

Page 10: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Tag-Topic Model

Page 11: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Agenda

Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion

Page 12: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

THDP

The THDP topic model draws upon the strengths of the two models (TTM, HDP); using the topic-based representation to model both the content of documents and the tag. As in the THDP model, a group of tags, Td, indicate the mainly purpose of the blog. For each word in the document a Tag is cho sen uniformly at random. Then, as in the topic model, a topic is chosen from a distribution over topics specific to that tag, and the word is generated form the chosen topic.

Page 13: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

THDP

Page 14: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

THDP

Given an underlying measure H on multinomial probability vectors, we select a random measure G0 which provides a countable infinite collection of multinomial probability vectors; these can be viewed as the set of all topics that can be used in a given corpus. For the lth tag in the jth document in the corpus we sample Gj using G0 as a base measure; this selects specific subsets of topics to be used in tag l in document j. From Gj we then generate a document by repeatedly (1) choose a tag with the equal probability from the tag sets associate with the document and (2) sampling specific multinomial probability vectors zji from Gj and sampling words wji with probabilities zji. The overlap among the random measures Gj implement the sharing of topics among documents.

Page 15: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Agenda

Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion

Page 16: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Experiments and evaluation

DataSet The dataset used in the experiment is from the

blog corpus during October 2011 and December 2012, which is constructed by Na tional Language Resources Monitoring and Research Center, Network Media Branch. After filtering out blog texts with no tags or containing less than 100 words and some prepro cessing such as remove stop words and extremely common words, filter out the non-nominal words and retain only the nouns or nominal phrases. The dataset containing the tags and context of N = 927 blog, with W = 10438 words in the vocabulary and T = 558 tags.

Page 17: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Experiments and evaluation

The perplexities for differ ent topic numbers of TTM and THDP

Page 18: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Experiments and evaluation

the topic number for differ ent iteration of THDP.

Page 19: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Experiments and evaluation

An illustration of 8 topics from 114–topic solution for the dataset, Each topic is shown with the 10 words and 5 tags that have the highest probability conditioned on that topic.

Page 20: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Agenda

Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion

Page 21: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Conclusion

In this paper, we propose a THDP model. The model uses the HDP as the prior distribution of TTM, which infer the topic number of dataset automati cally and links the tags to the topics of the document and capture the semantic of a tag in the form of topic distribution. Example results on the dataset are used to demonstrate the consistent and promising per formance of the proposed THDP, the computa tional expense of the proposed model is comparable to that of related topic model.

Page 22: Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

Thank you