chinese blog clustering by hidden sentiment factors

Post on 04-Jan-2016

21 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Chinese Blog Clustering by Hidden Sentiment Factors. ADMA 2009 Shi Feng, Daling Wang, Ge Yu, Chao Yang, and Nan Yang. College of Information Science and Engineering, Northeastern University. Hidden Sentiment Factors(HSF). Probabilistic latent semantic analysis (PLSA) - PowerPoint PPT Presentation

TRANSCRIPT

Chinese Blog Clustering by Hidden Sentiment Factors

ADMA 2009Shi Feng, Daling Wang, Ge Yu,

Chao Yang, and Nan Yang.College of Information Science and

Engineering, Northeastern University

Hidden Sentiment Factors(HSF)

• Probabilistic latent semantic analysis (PLSA)– Blog Set B = {b1,b2,…,bN}– Sentiment words set W = {w1,w2,…,wM}• NTUSD

– 2,812 positive words and 8,276 negative words

• Hownet Sentiment Dictionary– 4,566positive words and 4,370 negative words

– A = NxM Matrix , A(i,j) = Freq(bi,wj)– HSF Z = {z1,z2,….,zk}

Hidden Sentiment Factors(HSF)

Hidden Sentiment Factors(HSF)

P(w|b) -> P(z|b)

Clustering by HSF

• K-Means Algorithm– k’ : # of clusters. In this paper, set k’ = k.– Fig.1 Similarity=0– Fig.2 Similarity=?

Label Words Extraction

Experiment

– 1. Collect blogs about reviews on Stephen Chow’s movie “CJ7” (Long River 7)

– 2. Collect blog entries about Liu Xiang since 2008/8/18.

• Tag1. “Positive”, “Negative” and “Neutral”Tag2. “Irrelevant” or not

• Ex: A blog may tagged {“Positive” , ”Irrelevant”}, {“Neutral”} or {“Negative” , ”Irrelevant”}

• Evaluate the clustering purity.

Experiment

Experiment

Experiment

Experiment

top related