domain-based lexicon enhancement for sentiment analysis

15
Domain-based Lexicon Enhancement for Sentiment Analysis A. Muhammad, N. Wiratunga, R. Lothian, R. Glassey IDEAS Research Institute, Robert Gordon University, Aberdeen

Upload: herne

Post on 06-Jan-2016

65 views

Category:

Documents


1 download

DESCRIPTION

Domain-based Lexicon Enhancement for Sentiment Analysis. A. Muhammad, N. Wiratunga, R. Lothian, R. Glassey IDEAS Research Institute, Robert Gordon University, Aberdeen. Introduction. Sentiment Classification Sentiment Analysis A wider task, involves identification of Object/Aspects - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Domain-based Lexicon Enhancement  for Sentiment Analysis

Domain-based Lexicon Enhancement for Sentiment Analysis

A. Muhammad, N. Wiratunga, R. Lothian, R. Glassey

IDEAS Research Institute,Robert Gordon University, Aberdeen

Page 2: Domain-based Lexicon Enhancement  for Sentiment Analysis

Introduction

• Sentiment Classification

• Sentiment Analysis– A wider task, involves identification of

• Object/Aspects• Opinion holder• Time

Text Sentiment Classification

2BCS-SGAI-SMA-2013, Cambridge UK

Page 3: Domain-based Lexicon Enhancement  for Sentiment Analysis

Sentiment Classification

• Machine Learning

The movie is good : +The movie is horrible : -I don’t like the movie : -I love the movie : +…

Classifiere.g. NB, SVMs

Model

The movie is nice : ?

3BCS-SGAI-SMA-2013, Cambridge UK

Page 4: Domain-based Lexicon Enhancement  for Sentiment Analysis

Sentiment Classif… Cont’d

• Lexicon-Based

4

Contextual analysis/Aggregation

The movie is nice : ?

BCS-SGAI-SMA-2013, Cambridge UK

Page 5: Domain-based Lexicon Enhancement  for Sentiment Analysis

Lexicon Generation

5

Manual Corpus Dictionary

•Could be too narrow

•Could be too General

•Ugh!! this movie sucks!•This movie is fantastic

BCS-SGAI-SMA-2013, Cambridge UK

Page 6: Domain-based Lexicon Enhancement  for Sentiment Analysis

Sentiment Lexicons

• Dictionary-based: SentiWordNet (Baccianella et. al, 2010)

• Corpus-based– Generated from target domain– Existing approaches rely on well-formed

spelling/grammar

6

CorpusSeed

horrible happy

affordable rubbish

enjoyable mad …

goodbadterriblenice

but

and

(Hatzivassiloglou and Mckeown, 1997)

horrible happy

affordable rubbish

enjoyable mad …

Excellent

Poorcoocurrence

Turney, 2002

BCS-SGAI-SMA-2013, Cambridge UK

Page 7: Domain-based Lexicon Enhancement  for Sentiment Analysis

Corpus-based lexicon

• Distant-Supervision (Read 2005, Go et al 2009)

– Automated approach for labelling– Based on appearance of emoticons (, )

7

I’m happy with chocolate on vday

I’m at work today

BCS-SGAI-SMA-2013, Cambridge UK

Page 8: Domain-based Lexicon Enhancement  for Sentiment Analysis

Scores Generation

• Proportion-based

– Scores are compatible with SentiWordNet

8

AllDocs

cdc ttf

ttftds

)(

)()(

Term + score - score

ugh 0.077 0.923

sucks 0.132 0.868

luv 0.958 0.042

xoxo 0.792 0.208

… … …

BCS-SGAI-SMA-2013, Cambridge UK

Page 9: Domain-based Lexicon Enhancement  for Sentiment Analysis

Integration with SentiWordNet

• General Scores are extracted from SentiWordNet

9

)|,(|

1

)(|),(|

1)(

PoStsenses

icic tSenseScore

PoStsensestgs

BCS-SGAI-SMA-2013, Cambridge UK

Page 10: Domain-based Lexicon Enhancement  for Sentiment Analysis

Evaluation

• 20,000 Dist-Sup tweets used to:– Generate domain lexicon– Train Machine Learning classifiers

• For comparison

• 359 hand-labelled tweets used for evaluation

10BCS-SGAI-SMA-2013, Cambridge UK

Page 11: Domain-based Lexicon Enhancement  for Sentiment Analysis

Evaluation Cont’d

• Individual lexicons Vs Combined– General < Domain < Combined

• Difference not significant btw Domain and Combined

• Machine learning Vs Combined– SVM < NB < LogReg < Combined

• Difference not significant btw LogReg and Combined

BCS-SGAI-SMA-2013, Cambridge UK 11

Page 12: Domain-based Lexicon Enhancement  for Sentiment Analysis

Evaluation Cont’d

• Varying data sizes– Performance improves with increasing size for all

except SVM

12BCS-SGAI-SMA-2013, Cambridge UK

Page 13: Domain-based Lexicon Enhancement  for Sentiment Analysis

Conclusions

• Sentiment lexicon is generated using distant-supervision

• Sentiment classification improves with combination of domain-dependent and domain-independent lexicons

• Accuracy of the combination is better than machine learning

13BCS-SGAI-SMA-2013, Cambridge UK

Page 14: Domain-based Lexicon Enhancement  for Sentiment Analysis

Future work

• Lexicon refinement• Improve aggregation strategy• Extend approach to other Social media

platforms• Extend Dist-sup to neutral labelling• Experiment with ‘big data’

14BCS-SGAI-SMA-2013, Cambridge UK

Page 15: Domain-based Lexicon Enhancement  for Sentiment Analysis

Thank you for Listening!

BCS-SGAI-SMA-2013, Cambridge UK 15