sentiment analysis & opinion mining
DESCRIPTION
Sentiment Analysis & Opinion Mining. Lecture Two: March 3, 2011. Aditya M Joshi M Tech3, CSE IIT Bombay {[email protected]}. Sentiment analysis (SA). Task of tagging text with orientation of opinion This is a good movie. This is a bad movie. The movie is set in Australia. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/1.jpg)
Sentiment Analysis & Opinion MiningLecture Two: March 3, 2011
Aditya M JoshiM Tech3, CSEIIT Bombay
![Page 2: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/2.jpg)
Sentiment analysis (SA)
Task of tagging text with orientation of opinion
This is a good movie.
This is a bad movie.
The movie is set in Australia.
Subjective
Objective
RECAP
![Page 3: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/3.jpg)
Challenges of SA
• Domain dependent• Sarcasm• Thwarted expressions• Negation• Implicit polarity• Time-bounded
the sentences/words that contradict the overall sentiment
of the set are in majority
Example: “The actors are good, the music is brilliant and appealing.
Yet, the movie fails to strike a chord.”
Sarcasm uses words ofa polarity to represent
another polarity.
Example: “The perfume is soamazing that I suggest you wear it
with your windows shut”
Sentiment of a word is w.r.t. the
domain.
Example: ‘unpredictable’
For steering of a car,
For movie review,
“I did not like the movie.”
“Not only is the movie boring, it is also the biggest waste of producer’s
money.”
“Not withstanding the pressure of the public, let me admit that I have
loved the movie.”
“The camera of the mobile phone is less than one mega-pixel – quite
uncommon for a phone of today.”
“This phone allows me to send SMS.”
“This phone has a touch-screen.”
RECAP
![Page 4: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/4.jpg)
How much opinion?
Chart created using : www.technorati.com/chart/ RECAP
![Page 5: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/5.jpg)
Using ML for NLP
• Documents represented as feature vectors for classifiers
– Features: unigrams, etc.– Models: SVM, NB, etc.
Chart created using : www.technorati.com/chart/ RECAP
The movie is set in Australia. The movie is good.
The: 2movie: 2is: 2set: 1in: 1Australia: 1good: 1
![Page 6: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/6.jpg)
Support vector machines
• Basic idea
Separating hyperplane
Margin
Support vectors
“Maximum separating-margin classifier”
RECAP
![Page 7: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/7.jpg)
Results
Compared to list-based classifiers (58-69%) RECAP
![Page 8: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/8.jpg)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
![Page 9: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/9.jpg)
Resources for SA
SentiWordNet– WordNet synsets marked with three types of
scores: positive, negative, objective
I am feeling happy.I am feeling happy.
![Page 10: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/10.jpg)
LpLn
also-see
antonymy
Seed-set expansion in SWN
The sets at the end of kth step are called Tr(k,p) and Tr(k,n)
Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)
Seed words
![Page 11: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/11.jpg)
Building SentiWordnet • Classifier alternatives used: Rocchio (BowPackage) &
SVM(LibSVM) • Different training data based on expansion• POS –NOPOS and NEG-NONEG classification
• Total eight classifiers– For different combinations of k and classifiers
• Synsets not in the expanded seed set are used as test synsets– Score is average of scores returned by the classifiers
![Page 12: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/12.jpg)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
![Page 13: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/13.jpg)
Subjectivity detection
• Aim: To extract subjective portions of text• Algorithm used: Minimum cut algorithm
![Page 14: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/14.jpg)
Constructing the graph
To model item-specific and pairwise information
independently.
Nodes: Sentences of the document and source & sink
Source & sink representthe two classes of sentences
Edges: Weighted with either of the two scores
Prediction whether the sentence is subjective or not
Indsub(si)=
• Why graphs?• Nodes and edges?• Individual Scores• Association scores
Prediction whether two sentences should have
the same subjectivity level
T : Threshold – maximum distance upto which sentences may be considered proximalf: The decaying functioni, j : Position numbers
![Page 15: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/15.jpg)
Constructing the graph
• Build an undirected graph G with vertices {v1, v2…,s, t} (sentences and s,t)
• Add edges (s, vi) each with weight ind1(xi)
• Add edges (t, vi) each with weight ind2(xi)
• Add edges (vi, vk) with weight assoc (vi, vk)
• Partition cost:
![Page 16: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/16.jpg)
Example
Sample cuts:
![Page 17: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/17.jpg)
Document
Subjective
Results (1/2)
• Naïve Bayes, no extraction : 82.8%• Naïve Bayes, subjective extraction : 86.4%• Naïve Bayes, ‘flipped experiment’ : 71 %
DocumentSubjectivity
detectorObjective
POLARITY CLASSIFIER
![Page 18: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/18.jpg)
Results (2/2)
![Page 19: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/19.jpg)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
![Page 20: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/20.jpg)
Adjectives for SA
• Many adjectives have high sentiment value– A ‘beautiful’ bag– A ‘wooden’ bench– An ‘embarrassing’ performance– A ‘nice wooden’ bench– A ‘wooden nice’ bench
• An idea would be to augment this polarity information to adjectives in the WordNet
![Page 21: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/21.jpg)
Setup
• Two anchor words (extremes of the polarity spectrum) were chosen
• PMI of adjectives with respect to these adjectives is calculated
Polarity Score (W)= PMI(W,excellent) – PMI (W, poor)
excellent poor
wordPMI PMI
![Page 22: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/22.jpg)
Experimentation
• K-means clustering algorithm used on the basis of polarity scores
• The clusters contain words with similar polarities
• These words can be linked using an ‘isopolarity link’ in WordNet
![Page 23: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/23.jpg)
Results
• Three clusters seen• Major words were with negative polarity scores• The obscure words were removed by selecting
adjectives with familiarity count of 3– the ones that are not very common
• Also reports an improvement when scores are used as feature values
![Page 24: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/24.jpg)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
![Page 25: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/25.jpg)
Subject-based SA
The horse bolted.
The movie lacks a good story.
![Page 26: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/26.jpg)
Lexiconsubj. bolt
b VB bolt subj
subj. lack obj.
b VB lack obj ~subj
Argument that sends the sentiment (subj./obj.)
Argument that receives the sentiment (subj./obj.)
Argument that receives the sentiment (subj./obj.)
![Page 27: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/27.jpg)
Lexicon
• Also allows ‘\S+’ characters• Similar to regular expressions• E.g. to put \S+ to risk
– The favorability of the subject depends on the favorability of ‘\S+’.
![Page 28: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/28.jpg)
Example
The movie lacks a good story.
G JJ good obj.
The movie lacks \S+.
B VB lack obj ~subj.
Lexicon : Steps :
1) Consider a context window of upto five words
2) Shallow parse the sentence
3) Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S+’ characters at each step
![Page 29: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/29.jpg)
ResultsDescription Precision Recall
Benchmark corpus
Mixed statements
94.3% 28%
Open Test corpus
Reviews of a camera
94% 24%
![Page 30: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/30.jpg)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Cross-lingual SACross-domain SAOpinion SpamSA for tweets
![Page 31: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/31.jpg)
Hindidocument Sentiment Label
Cross-lingual SA
Englishdocument
SentimentAnalysisSystem
SentimentAnalysisSystem
• Multilingual content on the internet growing
• How can the sentiment it carries be identified?
• Can we take help of the ‘rich cousin’ English?
![Page 32: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/32.jpg)
Alternatives to Cross-lingual SA
Strategies for SA for target language
Use corpus in target language
Translate to a ‘rich’ source
language
Develop resources for target language
![Page 33: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/33.jpg)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Cross-lingual SACross-domain SAOpinion SpamSA for tweets
![Page 34: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/34.jpg)
Domain-dependence of words
• ‘deadly’– It was one deadly match!– There are some deadly poisonous snakes in the
jungles of Amazon.
![Page 35: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/35.jpg)
General Approach
• Retain the ‘common-to-all-domain’ words• Learn only the ‘special domain’ words
• Domain differences can be substantial
![Page 36: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/36.jpg)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Cross-lingual SACross-domain SAOpinion SpamSA for tweets
![Page 37: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/37.jpg)
Opinion spam: A side-effect of UGC
• Reviews contain rich user opinions on products and services
• Anyone can write anything on the Web– No quality control
• Result• Incentives
Low quality reviews,review spam / opinion
Spam.
Positive opinion -> Financial gain for
organization
![Page 38: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/38.jpg)
Different types of spam reviews• Type 1 (untruthful opinions)• Type 2 (reviews on brands only)• Type 3 (non-reviews)
Giving undeserving reviews to some target objects in order
to promote/demote the objecthyper spam - undeserving positive reviews
defaming spam - malicious negative reviews
DUPLICATES
No comment on the productComments on brands, manufacturer or
sellers of the product
Advertisements Other irrelevant reviews containing no opinions
e.g. questions, answers and random textAlthough you should not expect prompt shippin.
(It took 3 weeks and several e-mails before I received my order.)I would order again from this merchant,
just because the price was right - http://www.pricegrabber.com
It’s from nikon, what more you want..
Reference : [Jindal et al, 2008]
![Page 39: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/39.jpg)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Cross-lingual SACross-domain SAOpinion SpamSA for tweets
![Page 40: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/40.jpg)
Challenges with tweets
• Ill-formed– Spelling mistakes– Informal words/emoticons– Extensions of words (‘happppyyyyy’)
• Vague topics
www.clia.iitb.ac.in:8080/TwitterApp/index.jap
![Page 41: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/41.jpg)
Mood analysis
• Real-time updation of moods w. r. t. a topic
Snapshot: MoodViews
SOME ACTUAL APPLICATIONS
![Page 42: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/42.jpg)
Semantic search
• Sentiment search API by Evri• Claims to allow deeper answers like “who”, “why”
![Page 43: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/43.jpg)
A zeitgeist
• Understanding the ‘climate’
Snapshot: Twitscoop
![Page 44: Sentiment Analysis & Opinion Mining](https://reader035.vdocument.in/reader035/viewer/2022062218/56816374550346895dd4512d/html5/thumbnails/44.jpg)
… and many more