exploiting coherence in reviews for discovering ... - ibm · *ibm research +indian institute of...

22
1 Exploiting Coherence in Reviews for Discovering Latent Facets and Associated Sentiments Himabindu Lakkaraju * , Chiranjib Bhattacharyya + , Indrajit Bhattacharya + , Srujana Merugu * * IBM Research + Indian Institute of Science

Upload: others

Post on 21-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

1

Exploiting Coherence in Reviews for Discovering Latent Facets and

Associated Sentiments

Himabindu Lakkaraju*, Chiranjib Bhattacharyya+, Indrajit Bhattacharya+, Srujana Merugu*

*IBM Research +Indian Institute of Science

Page 2: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

2

Outline

• Motivation

• Background

• Our Models – Integrating Syntax and Semantics – FACTS Model– Incorporating Coherence – CFACTS Model– Incorporating Review Ratings – CFACTS-R Model

• Experimental Results

• Conclusions and Future Work

Page 3: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

3

Mining Customer Reviews

• Central Problem: Facet based sentiment analysis of customer reviews• Applications

– E-commerce : product recommendation for customers– Business Analytics : aiding product managers and decision makers in

understanding the product's market standing

Facet Sentiment

Memory -

Screen -

Appearance Positive

Facet Sentiment

Memory -

Screen Negative

Appearance -

Page 4: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

4

Existing Methods

• Feature based sentiment analysis • Scaffidi et. al (EC '07), Jin et. al (ICML '09)

• Facet extraction and sentiment analysis treated as separate phases• Facet extraction – Titov et. al (WWW '08), • Sentiment analysis – Lin et. al (CIKM '09), Li et. al. (AAAI '10)

• Rule based ontologies and facets, simple frequency based measures• Popescu et. al (EMNLP '05)

Need for - Domain independent, unsupervised or weakly supervised techniques

Joint modeling of facets and sentiments

Page 5: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

5

Background

• Latent Dirichlet Allocation (Blei et. al, JMLR '03)

Page 6: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

6

Background

Augments LDA with HMM – Only one syntactic class now has topics

• HMM – LDA (Griffiths et. al, NIPS '04)

Page 7: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

7

FACeT Sentiment extraction model (FACTS)

FACTS aims at extracting both facets as well as associated sentiments from customer reviews

Captures both the syntactic and semantic dependencies Loosely based on HMM – LDA Facet and Sentiment classes comprise of topics

Page 8: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

8

FACTS Model

Extends HMM-LDA to include topics within another syntactic class – 'sentiments'

Page 9: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

9

Coherence based FACTS model (CFACTS)

The pictures i took during my last trip with this camera were absolutely great. The picture quality is amazing and the pics come out clear and sharp. I am also very impressed with its battery life, unlike other cameras available in the market, the charge lasts long enough. However, I am unhappy with the accessories.

'Coherence' is an important aspect of user generated content In case of reviews, facet and sentiment coherence are usually prevalent

Page 10: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

10

CFACTS Model

Modeling coherence

Each review comprises of basic units of coherence – windows

Each window is associated with a single facet and sentiment

Continuity of topics across windows governed by parameter

Page 11: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

11

CFACTS Model

Extends FACTS to incorporate coherence in facets/sentiments

Also, enables loose coupling of the facet and sentiment classes

Page 12: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

12

Incorporating ratings - CFACTS-R

The flash washes out the photos, and the camera takes very long to turn on....................................................................................

Review ratings are valuable pointers to the sentiments expressed in reviews Does incorporating these review ratings help us extract sentiments better ?

– Review ratings turn out be of immense help for 'ordering sentiment topics'

flash washes out photos ??

Negative ?

Page 13: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

13

Incorporating ratings - CFACTS-R

This model -• Provides a complete view of a review incorporating all the aspects • Incorporating ratings further helps in 'ordering' the sentiments without explicit seed words

Review rating is generated as a normal linear model of 'individual sentiments'

Page 14: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

14

Inference

Inference using gibbs sampling

Ran the samplers for all the models for about 1000 iterations

Update equations for CFACTS-R -

Block Sampling of facet, sentiment topics and coherence parameter -

Page 15: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

15

Inference (contd.)

where g1 can be computed as and

Conditional distribution for the class variable -

where g2 can be computed as and

Page 16: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

16

Experimental Results

Dataset – Amazon reviews crawled during Nov. 2009

Evaluated the model performance over various tasks -

• Facet Extraction

• Sentiment Identification at multiple granularities ( document, sentence, word )

• Facet based opinion summarization

Product Category # of reviews

Digital Cameras 61482

Laptops 10011

Mobile Phones 6348

Flatpanel TVs 2346

Printers 2397

Page 17: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

17

Evaluation – Facet Extraction

Qualitative Evaluation Quantitative Evaluation Facet Coverage - the fraction of extracted facets that actually

correspond to product attributes. Benchmarked against amazon's structured ratings facets

Facet Purity – the fraction of the top words in the facet that actually correspond to the product attribute

Digital Camera Corpus

Page 18: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

18

Evaluation – Prior Knowledge

Prior Knowledge for FACTS and CFACTS - Seed words for different sentiment levels These seed words facilitate -

Distinction between facet and sentiment classes Distinction within the sentiment topics

Prior Knowledge for FACTS-R and CFACTS-R - Review ratings Seed words for the sentiment class as a whole – No seeding of individual

sentiment topics Seed words for sentiment class – great, amazing, good, easy, like, bad,

terrible, poor

Page 19: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

19

Evaluation – Sentiment Analysis

Word and Sentence level

– positive, negative, neutral sentiments– Ground truth for word level – sentiwordnet– Ground truth for sentence level sentiment – manually labeled 8,012 sentences

Review level

– Positive, negative– Five sentiment ratings ( on a scale of 1 to 5 )– Ground truth – amazon review ratings

Baseline – Joint topic sentiment model for sentiment analysis (Lin et. al, CIKM '09)

Page 20: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

20

Evaluation – facet based sentiment analysis

We have manually labelled 1500 reviews with (facet,polarity) pairs

Baselines -

− FIFS :- We have implemented a simple rule based feature and polarity word extractor. Further inorder to group the feature terms into facets we use PMI metric

− LFS :– LDA based algorithm to identify facets and sentiments

Page 21: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

21

Summary

Main Contributions● Introduce the notions of facet and sentiment coherence ● Probabilistic models for facet-based sentiment analysis that discover

latent facet topics and the corresponding sentiment ratings.● Domain independent, require no expert intervention,

unsupervised

Future Work● Faster inference for the proposed models● Extending approaches to handle hierarchies of facets

Page 22: Exploiting Coherence in Reviews for Discovering ... - IBM · *IBM Research +Indian Institute of Science . 2 Outline • Motivation • Background • Our Models – Integrating Syntax

22

Thank You !

Contact Author : Himabindu Lakkaraju ([email protected])Project Webpage : http://mllab.csa.iisc.ernet.in/downloads/reviewmining.html