sentiment analysis phd seminar balamurali a r(08405401) under guidance of prof. pushpak...

23
Sentiment Analysis PhD Seminar Balamurali A R(08405401) Under guidance of Prof. Pushpak Bhattacharyya Dept of CSE-IIT Bombay Mumbai

Upload: marybeth-bathsheba-short

Post on 25-Dec-2015

226 views

Category:

Documents


3 download

TRANSCRIPT

Sentiment Analysis

PhD SeminarBalamurali A R(08405401)

Under guidance of 

Prof. Pushpak Bhattacharyya

Dept of CSE-IIT BombayMumbai

o Introduction

o Motivation

o Challenges

o General Model

o Word level sentiment analysis

o Sentence level sentiment analysis

o Comparative sentence analysis

o Document level sentiment analysis

o Conclusion & Future works

o References

Outline

•Advent of UGC – A two way communication.

•Vast of information – Most of them direct feed backs

•Objective: To fine Sentiment or opinion of a user with regard to an entity/object

•Fine grain version of Subjectivity Analysis

•Subjectivity Analysis - finding whether phrase, sentence, document is subjective or objective.

Sentiment Analysis(SA) - Introduction

Businesses and organizations: Product and service benchmarking.

Market intelligence.   People:

Finding opinions while purchasing a new product   Finding opinions on political topics

Advertisement: Placing ads in the user-generated content

   Place an ad when one praises a product.   Place an ad from a competitor if one criticizes a product. 

Information search & Retrieval: Providing general search for "opinions".

Motivation

• Opinion holder (source) :person who holds the sentiment. E.g. I love playing hockey.

E.g I agree to what pope said “hate the sin not the sinners” -<I,Pope>

• Object (Target) :product, person, organization or a topic on which sentiment is expressed. E.g. I like nano. But I don’t like the steering of nano.

• Opinion/sentiment a view or appraisal on an object E.g. It’s a pity(negative) that she didn’t marry.

General Model

• Identifying source and target:

some of the parts is in not equal to the whole- [Turney’02] Movies and the themes included – how to separate the sentiment

“Movie was classic in fact Gabbar Singh was epitome of villainy!"

• Differentiating feature and attributes

“I hate iPod, but I like the scroll technology”

• Role of semantics

“How could anyone sit through this movie?”

• Issue of Ideology- [Sack’ 92]

“Saddam Hussein” - Mixed opinion?????

Challenges

Sentiment Classificatio

n at Document

level

Sentence level

sentiment analysis.

Word level Sentiment Analysis

Comparative sentence analysis

Sentiment Analysis: How to do?

Word level Sentiment Analysis

• Used for grammatically incoherent text – Short news paper headlines e.g. Almost Perfection [The Hindu’22/04/09]

• Direct computation using lexical resource – SentiWordNet, WordNetAffect

• SentiWordNet – wordnet graded with pos(c),neg(c)& obj(c) score. e.g. Love

– Created using classifiers– Interesting Findings -Mostly opinionated content

carried by modifiers(adjective & adverbs)

e.g. smart IITian

source:http://sentiwordnet.isti.cnr.it

E.g. “Manmohan insists troop stay in Guwhati, predicts midterm victory”

System achieves valence accuracy of 55%

Pre ProcessPOS

taggingDependency

Parsing

Global Sentence Rating

Addition of Rules

Linear Combination of

WordsEvaluation

UPAR’07

• Contextual information necessary for SA at sentence level

• “Indian observers were not happy about things happening in its border country, even though west were enjoying the show.”

• Cannot assign prior polarity to all words!

E.g long battery life & long time to recharge.

• Issue of negation – not happy

• Issue of syntactic role - Polluters are V/s they are polluters

• Issue of neutral polarity – look forward to

Sentence level Sentiment Analysis

How to include Contextual Polarity

Prior clues •Assumption-sentence polarity is product of clues•Different Clues Manually detected

Sub Detection •Classify sentence into polar/neutral•Create Feature Vector from Polar sentence

Polarity Classification •Disambiguate Contextual Polarity•Assign Polarity

• 28 Features for Neutral-polar classifier with an accuracy of 75.9%

• Polarity classifiers used 10 features for classification.

• Polarity classifier achieved an accuracy of 65.7%

Contd.

• A preferential emotion detection

“I like Prof. X class to Prof. Y class”• More Related to Opinion Mining• Common Feature – presence of comparison word e.g. IIT Bombay

is better than IIT Y• Comparison word may/may not opinionated(emotional state) –

e.g longer• Preference of sentence with opinionated sentence easy

– Find context -> <features, opinion comparison word><battery Life ,longer>

– Use context to determine opinion orientation- will be explained later.

Comparative sentence analysis

• Different types of comparativesNon equal Gradable(less than),Equative(same),Superlative(longest),

NonGradable(Nano and supera has got different features)

• Comparative Relation(CR)<long,battery,S1,S2>• Objective – Given CR, to find S1 or S2 is preferred.• Some more Categories of Comparatives

Type 1 (er,est), Type 2( more, most) , Increasing comparatives (longer), Decreasing Comparatives(fewer)

• Final analysis depend type of comparative word(C) and feature involved(F).– Opinionated comparative– Comparative with context dependent opinion (“higher milage”)

Comparative sentence analysis

• Different cases

Comparative sentence analysis

• e.g. Better• Assign S1 preferredC opinionated

• e.g. X makes more noise than Y• Use Comparative rules• Get preferred entity

Only F opinionated

• e.g. Long (battery) life• Use external source• Find OSA(F,C)=log(pr(f,c)pr(c|f)/(pr(f)pr(c)))• Decision rule is applied for preference

Both C & F not opinionated

•e.g. Nano is smaller than Indica•Count number of times it appears in cons and pros•If #pros(C) > #cons(C),S1 is preferred

C – feature indicator

• Baseline – default preference S1 -84%• System accuracy – 94%• Inferences

– People usually give S1 more preference in comparative sentence

Comparative sentence analysis

• To classify documents as positive or negativee.g.“Manali travel review” – Recommended/Not recommended

Sentiment Analysis – Document Level

Extract phrases - 2 words long(with context) e.g. good place

Classification based on Average semantic orientation of the phrases

Classify based on average threshold – Here its Zero

SO(phrase)=

e.g. “unethical practices” -8.484• Different categories were tested – Automobiles, Banks, Movies,

Travel destinations.

• Average Accuracy 74%, except for movies.

• Movies contain theme within expressing a sentimente.g Raj’s arrogance and sadistic mentality towards society is mercilessly

shown by director Mani Ratnam. The film can be regarded as one of all time best nonfiction movie.”

Sentiment Analysis – Document Level

excellent)poor)hits(near ehits(phras

hits(poor)excellent)near ehits(phraslog

Source: Pang& lee,2004

Base lined Version Graph Based Min Cut system

Sentiment Analysis – Document Level: A graph based method

• Objective: Minimize

• Individual score Non negative estimates of each xi preference for being in Cj

• Association score assoc (xi,xj) –:Non negative estimate of how important it is that xi and xj are in the same class.

• Solution: Create (G,V) = {v1, v2...vn, s, t} and partition into cuts of minimum costs.

• In our case S,T would be subjective and objective– indj (si) = pr(si|sub)

– Assoc(si, sj)= function of distance between si and sj

• Accuracy improved from 85.2% to 86.4%

Contd.

Conclusion:• Different level of text requires different treatment for assessing

the sentiment.• Domain of Text also play an important role.

Future work:• finding the target of the sentiment • Dealing with sarcasm• Multilingual sentiment analysis• Ideology and its handling

Conclusion & Future work

[1]. Warren Sack 1994,On the computation of point of view, Proceedings of the twelfth national conference on Artificial intelligence, 1994 pp. 1488.  

 

[2]. Pang & Lee 2002, Thumbs up? Sentiment Classification using Machine Learning Technique, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia,ACL July 2002, pp. 79-86.

 

[3].Peter Turney 2002, Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 417-424.

[4]. C Strapparava, A Valitutti ,2004 ,WordNet-Affect: an affective extension of WordNet ,Proceedings of LREC, Vol. 4 , pp. 1083-1086.

 

[5]. Bo Pang and Lillian Lee 2004, A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, Proceedings of ACL

 

[6]. Wiebe.et.al 2005, Annotating Expressions of Opinions and Emotions in Language, Computers and the Humanities, Vol. 39, No. 2-3. (May 2005), pp. 165-210.

 

[7]. Wilson Theresa, Wiebe Janyce, Hoffmann Paul. 2005, Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis, Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005)

 

 

Reference

[8]. Wiebe, J. and Mihalcea, R. 2006. Word sense and subjectivity. In Proceedings of the 21st international Conference on Computational Linguistics and the 44th Annual Meeting of the Association For Computational Linguistics (Sydney, Australia, July 17 - 18, 2006) PP 1065-1072

[9]. Andrea Esuli, Fabrizio Sebastiani 2006,SENTIWORDNET: A publicly available lexical resource for opinion mining, In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC’06),pp.417—422

 

[10]. François-Régis Chaumartin 2007, UPAR7: A knowledge-based system for headline sentiment tagging, , Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007)Prague, Association for Computational Linguistics pages 422–425

 

[11]. Liu, Bing 2007, Web Data Mining, Springer, chapter 11

 

[12].Ganapathibhotla & Liu 2008, Mining Opinions in Comparative Sentences, Proceedings of the 22nd International Conference on Computational Linguistics, Coling 2008, pages 241–248

 

[13].http://emetrics.org/2007/washingtondc/track_web20_measurement.php#usergenerated

 

[14]. http://en.wikipedia.org/wiki/User-generated_content