fake review detection

FAKE REVIEW DETECTION What are they saying about you ? Are they real …

Guided By : Dr. Animesh Mukherjee

ONLINE REVIEW

• Captures testimonials of “real” people (unlike advertisements).

• Shapes decision making of customers.

• Positive reviews – financial gains and fame for business.

• Deceptive opinion spamming – to promote or discredit some target product

and services.[1]

• Opinion Spammers admitted for being paid to write fake reviews.(Kost, 2012)

• Yelp.com – “Sting operation” : publically shame business who buy fake

reviews.[2]

[1] Jindal and Liu 2008 : http://www.cs.uic.edu/~liub/FBS/opinion-spam-WSDM-08.pdf

[2] Yelp official blog : http://officialblog.yelp.com/2013/05/how-yelp-protects-consumers-from-fake-reviews.html

http://www.cs.uic.edu/~liub/FBS/opinion-spam-WSDM-08.pdf







http://officialblog.yelp.com/2013/05/how-yelp-protects-consumers-from-fake-reviews.html













Consumer Alerts !!

• Amazon Mechanical Turk – crowd sourced online workers(turkers) to write fake

reviews ($1 per review) portraying 20 hotels of Chicago in positive light.[1]

• 400 fake positive reviews collected and 400 non-fake reviews on same 20 hotels

using Tripadvisor.com

• Yelp’s filtered and unfiltered reviews collected to understand working of Yelp

review classification algorithm.

• Approach : Linguistic n-gram features and some supervised learning method.[2]

[1] Amazon Mechanical Turk : https://www.mturk.com/

[2] Ott et al. 2011 : https://www.cs.cornell.edu/courses/CS4740/2012sp/lectures/op_spamACL2011.pdf

Dataset Collection and Approach for Analysis

https://www.mturk.com/


https://www.cs.cornell.edu/courses/CS4740/2012sp/lectures/op_spamACL2011.pdf



Linguistic Approach : Results

• Using only Bigram features :

accuracy of 89.6% on AMT data.[1]

• Using same n-gram features :

accuracy of 67.8% on Yelp Data.[2]

• Table 1: Class distribution of Yelp

data is skewed – imbalanced data

produces poor model.(Chawla et al,

2004)

• Good model for imbalanced data –

under sampling. (Drummond and

Holte, 2003)

[1] Ott et al. 2011 : https://www.cs.cornell.edu/courses/CS4740/2012sp/lectures/op_spamACL2011.pdf

[2] A Mukherjee - 2013 : http://www2.cs.uh.edu/~arjun/papers/ICWSM-Spam_final_camera-submit.pdf




http://www2.cs.uh.edu/~arjun/papers/ICWSM-Spam_final_camera-submit.pdf






Linguistic Approach Results : Explained

• For AMT data, word distribution of fake and non-fake reviews are very different,

which explains the high accuracy using n-gram.

• Reason for the different word distribution : Domain knowledge absence and little

gain in writing fake reviews($1 per review).

• Poor performance of n-grams for Yelp data because spammers according to

Yelp filter used very similar language in fake review as non-fake review :

linguistically very similar.[1]

• Inefficiency in linguistic in detecting fake reviews filtered by Yelp encourages to

do behavioral study of reviews.

[1] Mukherjee et al , 2013 : http://www2.cs.uh.edu/~arjun/papers/ICWSM-Spam_final_camera-submit.pdf







))(/)((2log)()||( iNiFiFNFKLi

Information Theoretic Analysis

• To explain huge accuracy difference : analysis of word distribution of AMT and

Yelp data.

• Good-Turing smoothed unigram language models.

• Computation of word distribution difference across fake & non-fake reviews :

Kullback-Leibler(KL) Divergence :

where F(i) and N(i) are respective probabilities of word i in fake and non-fake reviews.[1]

• Here KL(F||N) gives quantitative estimate of how much fake reviews linguistically

differ from non-fake reviews

[1] Kullback Leibler Divergence : http://www.cs.buap.mx/~dpinto/research/CICLing07_1/Pinto06c/node2.html

http://www.cs.buap.mx/~dpinto/research/CICLing07_1/Pinto06c/node2.html



Information Theoretic Analysis (Cont..)

• KL Divergence is Asymmetric :

• We have,

• For AMT data : and

• For Yelp data : and

• [1]

FNKLNFKL ||||

FNKLNFKL |||| 0KL

FNKLNFKL |||| 0KL


FNKLNFKLKL ||||







Information Theoretic Analysis (Cont..)

• KL(F||N) Definition implies that words having higher probability in F and very low

probability in N contribute most to KL-Divergence.

• To study word-wise contribution to ∆KL, word ∆KL calculated as :

where,

• Contribution of top k words to ∆KL for k= 200 and k=300

)||()||( iiwordiiword

i

Word FNKLNFKLKL

)(

)(log)||( 2

iN

iFiFNFKL iiword

Turkers didn’t do a good job at “Faking”!

Word-wise Difference of KL-

Divergence across top 200 words.

Equally Dense : |E| = |G|

• Symmetric Distribution of for top k-words for AMT

data implies the existence of two set of words:

1. set of words E, appearing more in fake reviews than

in non-fake, Ɐi € E, F(i) > N(i) resulting in

2. set of words G, appearing more in non-fake reviews

than in fake, Ɐi € G, N(i) > F(i) resulting in

i

WordKL

0 Ei

WordKL

0 Ei

WordKL

• Additionally top k=200 words only contribute 20% to ∆KL for the AMT data.

There are many words in AMT data having higher probabilities in fake than non-

fake and vice-versa.

• This implies fake and non-fake reviews in AMT data consist of words with very

different frequencies. Turkers didn’t do a good job in faking.[1]








Yelp Spammers are Smart but Overdid “Faking”!

• Yelp fake review data shows KL(F||N) is much larger than KL(N||F) and ∆KL>1.

• Given graphs from b-e shows among top 200 words which contribute to major

(=70%) most words have and only a few have .

Word-wise Difference of KL-Divergence across top 200 words.[1]

0 i

WordKL 0 i

WordKL








Yelp Spammers are Smart but Overdid “Faking”! (Cont..)

• Consider A be set of top words contributing most to ∆KL. We partition A as:

where and

• The curve above y=0 is dense and below it is sparse which implies

• clearly indicates that there exists specific words which contribute

most to ∆KL by appearing in fake reviews with much higher frequencies than

in non-fake reviews.

• Spammers made smart effort to ensure that their fake reviews have most

words that also appear in non-fake reviews to sound convincing.

))()(,.,(0| iFiNAiieKLiA Ni

Word

N

))()(,.,(0| iNiFAiieKLiA Fi

Word

F NF AAA NF AA

NF AA

NF AA

Yelp Spammers are Smart but Overdid “Faking”! (Cont..)

• While making their reviews sound convincing, psychologically they

happened to OVERUSE come words resulting higher frequency of certain

words in fake review than non-fake reviews.

• A quick lookup yields {us, price, stay, feel, deal, comfort} in hotel domain and

{options, went, seat, helpful, overall, serve, amount etc.} in restaurants.

• Prior personality work shown that deception/lying usually involves more use

of personal pronouns(eg. us) and associated action(eg. Went, feel) towards

specific targets(eg. option, price, stay) with objective of incorrect

projection(lying or faking).[1]

• Spammers caught by Yelp left behind linguistic footprints which can be

caught by precise behavioral study.

[1] Newman et al , 2003 : http://www.communicationcache.com/uploads/1/0/8/8/10887248/lying_words-_predicting_deception_from_linguistic_styles.pdf

http://www.communicationcache.com/uploads/1/0/8/8/10887248/lying_words-_predicting_deception_from_linguistic_styles.pdf




Spamming Behavior Analysis

1. Maximum Number of Reviews(MNR) : Writing too many reviews in a day is

abnormal.

2. Percentage of Positive Reviews(PR) : Deception words in fake reviews

indicates projection in positive light. CDF of positive(4-5 stars) reviews

among all reviews is plotted to illustrate analysis.

3. Review Length(RL) : While writing fake experiences, there is probably not

much to write and also spammer not want to spend too much time in it.

4. Maximum Content Similarity(MCS) : To examine if some posted reviews are

similar to previous reviews, we computed cosine similarity between two

reviews of a reviewer. Non-spammers mostly write new contents.

Maximum Number of Reviews Percentage of Positive Reviews

Review Length Maximum Content Similarity

Spamming Behavior Analysis CDFs








Challenges with Supervised Evaluation

• Very difficult to find gold-standard data of fake and non-fake reviews for

model building – too difficult to manually recognize/label fake/non-fake

reviews by mere reading.

• Duplicate and near duplicate assumed to be fake, which is unreliable.

• Usage of manually labeled dataset – reliability issues because it have been

shown that accuracy of human labeling of fake reviews is very poor.[1]

• AMT crowdsourcing fake reviews by paying are fake yet they don’t reflect the

dynamics of fake reviews in commercial website.[2]

• This lack of labeled data, motivates to look after the unsupervised methods of

classification.[3]

[1] Ott M, Choi, Y, Cardie, C. and Hancock, J.T. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination.


[3] Mukherjee et al : http://delivery.acm.org/10.1145/2490000/2487580/p632-mukherjee.pdf







http://delivery.acm.org/10.1145/2490000/2487580/p632-mukherjee.pdf





Unsupervised Evaluation Model

• Since human labeling for supervised learning is difficult, problem was

proposed by modeling spamicity (degree of spamming) as latent with other

observed behavioral features.

• Unsupervised model – Author Spamicity Model (ASM) proposed.[1]

• Taken a fully Bayesian approach and formulated opinion spam detection as

clustering problem.

• Opinion spammers have different behavioral distributions than non-spammers.

• Causes distributional divergence between latent population distributions of two

clusters: spammers and non-spammers.[1]

• Model inference results in learning the population distributions of two clusters.





• Formulates spam detection as an unsupervised clustering problem in

Bayesian setting.

• Belongs to class of Generative models for clustering based on set of

observed features.

• Models spamicity 𝑠𝑎(in range[0, 1]) of an author a; and spam label 𝜋𝑟 of a

review which is the class variable reflecting the cluster membership (two

cluster K=2, spam and non-spam).[1]

• Each author/reviewer and respectively each review has a set of observed

features(behavioral clues).

• Certain characteristics of abnormal behavior defined which likely to link

with spamming and thus can be exploited in model for learning spam and

non-spam clusters. [1]

Author Spamicity Model





Author Features

• It have value in range [0, 1] and value close to 1 indicates spamming.

1. Content Similarity : Crafting new review every time is time consuming, spammers

likely to copy reviews across similar product. Choose maximum similarity to

capture worst spamming behavior.[1]

2. Maximum Number of Reviews : Posting many reviews a day is also abnormal.

3. Reviewing Burstiness : Spammers are usually not longtime members of site.

Defined over an activity window(first and last review posting date). If posted over

reasonably long timeframe, it probably a normal activity but all review posted

within short burst likely to be spam.[2]

4. Ratio of First Review : People mostly rely on early reviews and spamming early

impacts hugely on sales. Spammers try to be among first reviewers. [1]


[2] Mukherjee, A., Liu, B. and Glance, N. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. WWW (2012).




Review Features

• It have five binary review features. Value 1 indicates spamming else 0 is non-spamming.

1. Duplicate /Near Duplicate Reviews: Spammers often post multiple reviews which are

duplicate /near duplicate on same product to boost ratings.

2. Extreme Rating: Spammers mostly like to give extreme ratings(1 or 5) in order to boost

ratings to demote or promote products.

3. Rating Deviation: Spammers usually involve in wrong projection in either positive or

negative and it deviates from average ratings given by other reviewers.

4. Early Time Frame: Early review can greatly impact people’s sentiments on a product.

5. Rating Abuse: Multiple ratings on same product are unusual. Similar to DUP but focuses

on rating dimensions rather than content.

CONCLUSIONS

• We presented an in-depth investigation of nature of fake reviews in Commercial

settings of Yelp.com.

• Our study shows linguistics methods of (Ott et al., 2011) and its high accuracy in

AMT data.

• We presented a behavioral study of spammers for real-life fake reviews.

• We presented a brief introduction to n-gram language model.

• We presented challenges with the supervised evaluation and gave idea about the

unsupervised approach of evaluation.

• We presented a brief introduction to unsupervised Author Spamicity Model (ASM).

REFERENCES

1. Jindal and Liu 2008 : http://www.cs.uic.edu/~liub/FBS/opinion-spam-WSDM-08.pdf

2. Yelp official blog : http://officialblog.yelp.com/2013/05/how-yelp-protects-consumers-from-fake-

reviews.html

3. MIT N-gram Language Model Tutorial:

http://web.mit.edu/6.863/www/fall2012/readings/ngrampages.pdf

4. Amazon Mechanical Turk : https://www.mturk.com/

5. Ott et al. 2011 :


6. Mukherjee et al , 2013 : http://www2.cs.uh.edu/~arjun/papers/ICWSM-Spam_final_camera-

submit.pdf

7. Kullback Leibler Divergence :


8. Newman et al , 2003 :

http://www.communicationcache.com/uploads/1/0/8/8/10887248/lying_words-

_predicting_deception_from_linguistic_styles.pdf

9. Mukherjee, A., Liu, B. and Glance, N. 2012. Spotting Fake Reviewer Groups in Consumer

Reviews. WWW (2012).

10. Mukherjee et al : http://delivery.acm.org/10.1145/2490000/2487580/p632-mukherjee.pdf

11. Ott M, Choi, Y, Cardie, C. and Hancock, J.T. 2011. Finding Deceptive Opinion Spam by Any

Stretch of the Imagination.












































Any Question ?

Thank You !!

fake review detection

Documents