chinese blog clustering by hidden sentiment factors

11
Chinese Blog Clustering by Hidden Sentiment Factors ADMA 2009 Shi Feng, Daling Wang, Ge Yu, Chao Yang, and Nan Yang. College of Information Science and Engineering, Northeastern University

Upload: lazzaro-murphy

Post on 04-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Chinese Blog Clustering by Hidden Sentiment Factors. ADMA 2009 Shi Feng, Daling Wang, Ge Yu, Chao Yang, and Nan Yang. College of Information Science and Engineering, Northeastern University. Hidden Sentiment Factors(HSF). Probabilistic latent semantic analysis (PLSA) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chinese Blog Clustering by Hidden Sentiment Factors

Chinese Blog Clustering by Hidden Sentiment Factors

ADMA 2009Shi Feng, Daling Wang, Ge Yu,

Chao Yang, and Nan Yang.College of Information Science and

Engineering, Northeastern University

Page 2: Chinese Blog Clustering by Hidden Sentiment Factors

Hidden Sentiment Factors(HSF)

• Probabilistic latent semantic analysis (PLSA)– Blog Set B = {b1,b2,…,bN}– Sentiment words set W = {w1,w2,…,wM}• NTUSD

– 2,812 positive words and 8,276 negative words

• Hownet Sentiment Dictionary– 4,566positive words and 4,370 negative words

– A = NxM Matrix , A(i,j) = Freq(bi,wj)– HSF Z = {z1,z2,….,zk}

Page 3: Chinese Blog Clustering by Hidden Sentiment Factors

Hidden Sentiment Factors(HSF)

Page 4: Chinese Blog Clustering by Hidden Sentiment Factors

Hidden Sentiment Factors(HSF)

P(w|b) -> P(z|b)

Page 5: Chinese Blog Clustering by Hidden Sentiment Factors

Clustering by HSF

• K-Means Algorithm– k’ : # of clusters. In this paper, set k’ = k.– Fig.1 Similarity=0– Fig.2 Similarity=?

Page 6: Chinese Blog Clustering by Hidden Sentiment Factors

Label Words Extraction

Page 7: Chinese Blog Clustering by Hidden Sentiment Factors

Experiment

– 1. Collect blogs about reviews on Stephen Chow’s movie “CJ7” (Long River 7)

– 2. Collect blog entries about Liu Xiang since 2008/8/18.

• Tag1. “Positive”, “Negative” and “Neutral”Tag2. “Irrelevant” or not

• Ex: A blog may tagged {“Positive” , ”Irrelevant”}, {“Neutral”} or {“Negative” , ”Irrelevant”}

• Evaluate the clustering purity.

Page 8: Chinese Blog Clustering by Hidden Sentiment Factors

Experiment

Page 9: Chinese Blog Clustering by Hidden Sentiment Factors

Experiment

Page 10: Chinese Blog Clustering by Hidden Sentiment Factors

Experiment

Page 11: Chinese Blog Clustering by Hidden Sentiment Factors

Experiment