sentiment analysis - stanford university · 2018-10-25 · dan jurafsky sentiment analysis •...
TRANSCRIPT
SentimentAnalysis
WhatisSentimentAnalysis?
DanJurafsky
Positiveornegativemoviereview?
• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsome
greatplottwists• thisisthegreatestscrewballcomedyeverfilmed• Itwaspathetic.Theworstpartaboutitwastheboxing
scenes.
2
DanJurafsky
• a
3
GoogleShoppingaspectshttps://www.google.com/shopping/product/7914298775914872081
DanJurafsky TwittersentimentversusGallupPollofConsumerConfidence
Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. FromTweetstoPolls:LinkingTextSentimenttoPublicOpinionTimeSeries.InICWSM-2010
DanJurafsky
Twittersentiment:
JohanBollen,Huina Mao,Xiaojun Zeng.2011.Twittermoodpredictsthestockmarket,JournalofComputationalScience2:1,1-8.10.1016/j.jocs.2010.12.007.
5
DanJurafsky
TargetSentimentonTwitter
• TwitterSentimentApp• AlecGo,Richa Bhayani,LeiHuang.2009.
TwitterSentimentClassificationusingDistantSupervision
6
DanJurafsky
Sentimentanalysishasmanyothernames
• Opinionextraction• Opinionmining• Sentimentmining• Subjectivityanalysis
7
DanJurafsky
Whysentimentanalysis?
• Movie:isthisreviewpositiveornegative?• Products:whatdopeoplethinkaboutthenewiPhone?• Publicsentiment:howisconsumerconfidence?Isdespairincreasing?
• Politics:whatdopeoplethinkaboutthiscandidateorissue?• Prediction:predictelectionoutcomesormarkettrendsfromsentiment
8
DanJurafsky
SchererTypologyofAffectiveStates
• Emotion:brieforganicallysynchronized…evaluationofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated
• Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant
• Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction• friendly,flirtatious,distant,cold,warm,supportive,contemptuous
• Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons• liking,loving,hating,valuing,desiring
• Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hostile,jealous
DanJurafsky
SchererTypologyofAffectiveStates
• Emotion:brieforganicallysynchronized…evaluationofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated
• Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant
• Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction• friendly,flirtatious,distant,cold,warm,supportive,contemptuous
• Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons• liking,loving,hating,valuing,desiring
• Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hostile,jealous
DanJurafsky
SentimentAnalysis
• Sentimentanalysisisthedetectionofattitudes“enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons”1. Holder(source)ofattitude2. Target(aspect)ofattitude3. Typeofattitude• Fromasetoftypes
• Like,love,hate,value,desire, etc.• Or(morecommonly)simpleweightedpolarity:
• positive,negative,neutral,togetherwithstrength4. Text containingtheattitude• Sentence orentiredocument11
DanJurafsky
SentimentAnalysis
• Simplesttask:• Istheattitudeofthistextpositiveornegative?
• Morecomplex:• Ranktheattitudeofthistextfrom1to5
• Advanced:• Detectthetarget(stancedetection)• Detectsource• Complexattitudetypes
DanJurafsky
SentimentAnalysis
• Simplesttask:• Istheattitudeofthistextpositiveornegative?
• Morecomplex:• Ranktheattitudeofthistextfrom1to5
• Advanced:• Detectthetarget(stancedetection)• Detectsource• Complexattitudetypes
SentimentAnalysis
WhatisSentimentAnalysis?
SentimentAnalysis
ABaselineAlgorithm
DanJurafsky
Sentiment Classification in Movie Reviews
• Polaritydetection:• IsanIMDBmoviereviewpositiveornegative?
• Data:PolarityData2.0:• http://www.cs.cornell.edu/people/pabo/movie-review-data
BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.BoPangandLillianLee.2004.ASentimentalEducation:SentimentAnalysisUsingSubjectivitySummarizationBasedonMinimumCuts.ACL,271-278
DanJurafsky
IMDBdatainthePangandLeedatabase
when_starwars_cameoutsometwentyyearsago,theimageoftravelingthroughoutthestarshasbecomeacommonplaceimage.[…]whenhan sologoeslightspeed,thestarschangetobrightlines,goingtowardstheviewerinlinesthatconvergeataninvisiblepoint.cool._october sky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenightsky.[...]
“snakeeyes”isthemostaggravatingkindofmovie:thekindthatshowssomuchpotentialthenbecomesunbelievablydisappointing.it’snotjustbecausethisisabriandepalma film,andsincehe’sagreatdirectorandonewho’sfilmsarealwaysgreetedwithatleastsomefanfare.andit’snotevenbecausethiswasafilmstarringnicolas cageandsincehegivesabrauvara performance,thisfilmishardlyworthhistalents.
✓ ✗
DanJurafsky
BaselineAlgorithm(adaptedfromPangandLee)
• Tokenization• FeatureExtraction• Classificationusingdifferentclassifiers• NaiveBayes• MaxEnt• SVM
DanJurafsky
SentimentTokenizationIssues
• DealwithHTMLandXMLmarkup• Twittermark-up(names,hashtags)• Capitalization(preservefor
wordsinallcaps)• Phonenumbers,dates• Emoticons• Usefulcode:
• ChristopherPottssentimenttokenizer• BrendanO’Connortwittertokenizer19
[<>]? # optional hat/brow[:;=8] # eyes[\-o\*\']? # optional nose[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth[\-o\*\']? # optional nose[:;=8] # eyes[<>]? # optional hat/brow
Pottsemoticons
DanJurafsky
ExtractingFeaturesforSentimentClassification
• Howtohandlenegation• I didn’t like this movievs• Don't dismiss this film
20
DanJurafsky
Negation
AddNOT_toeverywordbetweennegationandfollowingpunctuation:
didn’t like this movie , but I
didn’t NOT_like NOT_this NOT_movie but I
Das,Sanjiv andMikeChen.2001.Yahoo!forAmazon:Extractingmarketsentimentfromstockmessageboards.InProceedingsoftheAsiaPacificFinanceAssociationAnnualConference(APFA).Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
DanJurafsky
ExtractingFeaturesforSentimentClassification
Whichwordstouse?• Onlyadjectives• Allwords
Allwordsturnsouttoworkbetter,atleastonthisdata22
DanJurafsky
Reminder:Naive Bayes
23
cNB = argmaxc j∈C
P(cj ) P(wi | cj )i∈positions∏
DanJurafsky
Reminder:Naive Bayes
24
LetNc benumberofdocumentswithclasscLetNdoc betotalnumberofdocuments
6.2 • TRAINING THE NAIVE BAYES CLASSIFIER 5
positions all word positions in test document
cNB = argmaxc2C
P(c)Y
i2positions
P(wi|c) (6.9)
Naive Bayes calculations, like calculations for language modeling, are done inlog space, to avoid underflow and increase speed. Thus Eq. 6.9 is generally insteadexpressed as
cNB = argmaxc2C
logP(c)+X
i2positions
logP(wi|c) (6.10)
By considering features in log space Eq. 6.10 computes the predicted class asa linear function of input features. Classifiers that use a linear combination ofthe inputs to make a classification decision —like naive Bayes and also logisticregression— are called linear classifiers.linear
classifiers
6.2 Training the Naive Bayes Classifier
How can we learn the probabilities P(c) and P( fi|c)? Let’s first consider the max-imum likelihood estimate. We’ll simply use the frequencies in the data. For thedocument prior P(c) we ask what percentage of the documents in our training setare in each class c. Let Nc be the number of documents in our training data withclass c and Ndoc be the total number of documents. Then:
P̂(c) =Nc
Ndoc(6.11)
(6.12)
To learn the probability P( fi|c), we’ll assume a feature is just the existence of aword in the document’s bag of words, and so we’ll want P(wi|c), which we computeas the fraction of times the word wi appears among all words in all documents oftopic c. We first concatenate all documents with category c into one big “categoryc” text. Then we use the frequency of wi in this concatenated document to give amaximum likelihood estimate of the probability:
P̂(wi|c) =count(wi,c)Pw2V count(w,c)
(6.13)
Here the vocabulary V consists of the union of all the word types in all classes,not just the words in one class c.
There is a problem, however, with maximum likelihood training. Imagine weare trying to estimate the likelihood of the word “fantastic” given class positive, butsuppose there are no training documents that both contain the word “fantastic” andare classified as positive. Perhaps the word “fantastic” happens to occur (sarcasti-cally?) in the class negative. In such a case the probability for this feature will bezero:
DanJurafsky
Reminder:Naive Bayes
25
• Likelihoods
• Whataboutzeros?Suppose"fantastic"neveroccurs?
• Add-onesmoothing
6.2 • TRAINING THE NAIVE BAYES CLASSIFIER 5
positions all word positions in test document
cNB = argmaxc2C
P(c)Y
i2positions
P(wi|c) (6.9)
Naive Bayes calculations, like calculations for language modeling, are done inlog space, to avoid underflow and increase speed. Thus Eq. 6.9 is generally insteadexpressed as
cNB = argmaxc2C
logP(c)+X
i2positions
logP(wi|c) (6.10)
By considering features in log space Eq. 6.10 computes the predicted class asa linear function of input features. Classifiers that use a linear combination ofthe inputs to make a classification decision —like naive Bayes and also logisticregression— are called linear classifiers.linear
classifiers
6.2 Training the Naive Bayes Classifier
How can we learn the probabilities P(c) and P( fi|c)? Let’s first consider the max-imum likelihood estimate. We’ll simply use the frequencies in the data. For thedocument prior P(c) we ask what percentage of the documents in our training setare in each class c. Let Nc be the number of documents in our training data withclass c and Ndoc be the total number of documents. Then:
P̂(c) =Nc
Ndoc(6.11)
(6.12)
To learn the probability P( fi|c), we’ll assume a feature is just the existence of aword in the document’s bag of words, and so we’ll want P(wi|c), which we computeas the fraction of times the word wi appears among all words in all documents oftopic c. We first concatenate all documents with category c into one big “categoryc” text. Then we use the frequency of wi in this concatenated document to give amaximum likelihood estimate of the probability:
P̂(wi|c) =count(wi,c)Pw2V count(w,c)
(6.13)
Here the vocabulary V consists of the union of all the word types in all classes,not just the words in one class c.
There is a problem, however, with maximum likelihood training. Imagine weare trying to estimate the likelihood of the word “fantastic” given class positive, butsuppose there are no training documents that both contain the word “fantastic” andare classified as positive. Perhaps the word “fantastic” happens to occur (sarcasti-cally?) in the class negative. In such a case the probability for this feature will bezero:
6 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION
P̂(“fantastic”|positive) =count(“fantastic”,positive)P
w2V count(w,positive)= 0 (6.14)
But since naive Bayes naively multiplies all the feature likelihoods together, zeroprobabilities in the likelihood term for any class will cause the probability of theclass to be zero, no matter the other evidence!
The simplest solution is the add-one (Laplace) smoothing introduced in Chap-ter 4. While Laplace smoothing is usually replaced by more sophisticated smoothingalgorithms in language modeling, it is commonly used in naive Bayes text catego-rization:
P̂(wi|c) =count(wi,c)+1P
w2V (count(w,c)+1)=
count(wi,c)+1�Pw2V count(w,c)
�+ |V |
(6.15)
Note once again that it is a crucial that the vocabulary V consists of the unionof all the word types in all classes, not just the words in one class c (try to convinceyourself why this must be true; see the exercise at the end of the chapter).
What do we do about words that occur in our test data but are not in our vocab-ulary at all because they did not occur in any training document in any class? Thestandard solution for such unknown words is to ignore such words—remove themfrom the test document and not include any probability for them at all.
Finally, some systems choose to completely ignore another class of words: stopwords, very frequent words like the and a. This can be done by sorting the vocabu-stop words
lary by frequency in the training set, and defining the top 10–100 vocabulary entriesas stop words, or alternatively by using one of the many pre-defined stop word listavailable online. Then every instance of these stop words are simply removed fromboth training and test documents as if they had never occurred. In most text classi-fication applications, however, using a stop word list doesn’t improve performance,and so it is more common to make use of the entire vocabulary and not use a stopword list.
Fig. 6.2 shows the final algorithm.
6.3 Worked example
Let’s walk through an example of training and testing naive Bayes with add-onesmoothing. We’ll use a sentiment analysis domain with the two classes positive(+) and negative (-), and take the following miniature training and test documentssimplified from actual movie reviews.
Cat DocumentsTraining - just plain boring
- entirely predictable and lacks energy- no surprises and very few laughs+ very powerful+ the most fun film of the summer
Test ? predictable with no fun
The prior P(c) for the two classes is computed via Eq. 6.12 as NcNdoc
:
6 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION
P̂(“fantastic”|positive) =count(“fantastic”,positive)P
w2V count(w,positive)= 0 (6.14)
But since naive Bayes naively multiplies all the feature likelihoods together, zeroprobabilities in the likelihood term for any class will cause the probability of theclass to be zero, no matter the other evidence!
The simplest solution is the add-one (Laplace) smoothing introduced in Chap-ter 4. While Laplace smoothing is usually replaced by more sophisticated smoothingalgorithms in language modeling, it is commonly used in naive Bayes text catego-rization:
P̂(wi|c) =count(wi,c)+1P
w2V (count(w,c)+1)=
count(wi,c)+1�Pw2V count(w,c)
�+ |V |
(6.15)
Note once again that it is a crucial that the vocabulary V consists of the unionof all the word types in all classes, not just the words in one class c (try to convinceyourself why this must be true; see the exercise at the end of the chapter).
What do we do about words that occur in our test data but are not in our vocab-ulary at all because they did not occur in any training document in any class? Thestandard solution for such unknown words is to ignore such words—remove themfrom the test document and not include any probability for them at all.
Finally, some systems choose to completely ignore another class of words: stopwords, very frequent words like the and a. This can be done by sorting the vocabu-stop words
lary by frequency in the training set, and defining the top 10–100 vocabulary entriesas stop words, or alternatively by using one of the many pre-defined stop word listavailable online. Then every instance of these stop words are simply removed fromboth training and test documents as if they had never occurred. In most text classi-fication applications, however, using a stop word list doesn’t improve performance,and so it is more common to make use of the entire vocabulary and not use a stopword list.
Fig. 6.2 shows the final algorithm.
6.3 Worked example
Let’s walk through an example of training and testing naive Bayes with add-onesmoothing. We’ll use a sentiment analysis domain with the two classes positive(+) and negative (-), and take the following miniature training and test documentssimplified from actual movie reviews.
Cat DocumentsTraining - just plain boring
- entirely predictable and lacks energy- no surprises and very few laughs+ very powerful+ the most fun film of the summer
Test ? predictable with no fun
The prior P(c) for the two classes is computed via Eq. 6.12 as NcNdoc
:
DanJurafsky
Binarized (Booleanfeature)MultinomialNaive Bayes
• Intuition:• Forsentiment(andprobablyforothertextclassificationdomains)
• Wordoccurrencemaymattermorethanwordfrequency• Theoccurrenceofthewordfantastic tellsusalot• Thefactthatitoccurs5timesmaynottellusmuchmore.
• "BinaryNaive Bayes"• Clipsallthewordcountsineachdocumentat1
26
DanJurafsky
BooleanMultinomialNaiveBayes:Learning
• CalculateP(cj) terms• Foreachcj inC do
docsj¬ alldocswithclass=cj
P(cj )←| docsj |
| total # documents| P(wk | cj )←nk +α
n+α |Vocabulary |
• Textj¬ singledoccontainingalldocsj• For eachwordwk inVocabulary
nk¬ #ofoccurrencesofwk inTextj
• Fromtrainingcorpus,extractVocabulary
• CalculateP(wk | cj) terms• Removeduplicatesineachdoc:
• Foreachwordtypewindocj• Retainonlyasingleinstanceofw
DanJurafsky
BooleanMultinomialNaive Bayes(BinaryNB)onatestdocumentd
28
• Firstremoveallduplicatewordsfromd• ThencomputeNBusingthesameequation:
cNB = argmaxc j∈C
P(cj ) P(wi | cj )i∈positions∏
DanJurafsky
Normalvs.BinaryNB
29
8 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION
P(+)P(S|+) =25⇥ 1⇥1⇥2
293 = 3.2⇥10�5
The model thus predicts the class negative for the test sentence.
6.4 Optimizing for Sentiment Analysis
While standard naive Bayes text classification can work well for sentiment analysis,some small changes are generally employed that improve performance.
First, for sentiment classification and a number of other text classification tasks,whether a word occurs or not seems to matter more than its frequency. Thus it oftenimproves performance to clip the word counts in each document at 1. This variantis called binary multinominal naive Bayes or binary NB. The variant uses thebinary NB
same Eq. 6.10 except that for each document we remove all duplicate words beforeconcatenating them into the single big document. Fig. 6.3 shows an example inwhich a set of four documents (shortened and text-normalized for this example) areremapped to binary, with the modified counts shown in the table on the right. Theexample is worked without add-1 smoothing to make the differences clearer. Notethat the results counts need not be 1; the word great has a count of 2 even for BinaryNB, because it appears in multiple documents.
Four original documents:� it was pathetic the worst part was the
boxing scenes� no plot twists or great scenes+ and satire and great plot twists+ great scenes great film
After per-document binarization:� it was pathetic the worst part boxing
scenes� no plot twists or great scenes+ and satire great plot twists+ great scenes film
NB BinaryCounts Counts+ � + �
and 2 0 1 0boxing 0 1 0 1film 1 0 1 0great 3 1 2 1it 0 1 0 1no 0 1 0 1or 0 1 0 1part 0 1 0 1pathetic 0 1 0 1plot 1 1 1 1satire 1 0 1 0scenes 1 2 1 2the 0 2 0 1twists 1 1 1 1was 0 2 0 1worst 0 1 0 1
Figure 6.3 An example of binarization for the binary naive Bayes algorithm.
A second important addition commonly made when doing text classification forsentiment is to deal with negation. Consider the difference between I really like thismovie (positive) and I didn’t like this movie (negative). The negation expressed bydidn’t completely alters the inferences we draw from the predicate like. Similarly,negation can modify a negative word to produce a positive review (don’t dismiss thisfilm, doesn’t let us get bored).
A very simple baseline that is commonly used in sentiment to deal with negationis during text normalization to prepend the prefix NOT to every word after a tokenof logical negation (n’t, not, no, never) until the next punctuation mark. Thus thephrase
8 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION
P(+)P(S|+) =25⇥ 1⇥1⇥2
293 = 3.2⇥10�5
The model thus predicts the class negative for the test sentence.
6.4 Optimizing for Sentiment Analysis
While standard naive Bayes text classification can work well for sentiment analysis,some small changes are generally employed that improve performance.
First, for sentiment classification and a number of other text classification tasks,whether a word occurs or not seems to matter more than its frequency. Thus it oftenimproves performance to clip the word counts in each document at 1. This variantis called binary multinominal naive Bayes or binary NB. The variant uses thebinary NB
same Eq. 6.10 except that for each document we remove all duplicate words beforeconcatenating them into the single big document. Fig. 6.3 shows an example inwhich a set of four documents (shortened and text-normalized for this example) areremapped to binary, with the modified counts shown in the table on the right. Theexample is worked without add-1 smoothing to make the differences clearer. Notethat the results counts need not be 1; the word great has a count of 2 even for BinaryNB, because it appears in multiple documents.
Four original documents:� it was pathetic the worst part was the
boxing scenes� no plot twists or great scenes+ and satire and great plot twists+ great scenes great film
After per-document binarization:� it was pathetic the worst part boxing
scenes� no plot twists or great scenes+ and satire great plot twists+ great scenes film
NB BinaryCounts Counts+ � + �
and 2 0 1 0boxing 0 1 0 1film 1 0 1 0great 3 1 2 1it 0 1 0 1no 0 1 0 1or 0 1 0 1part 0 1 0 1pathetic 0 1 0 1plot 1 1 1 1satire 1 0 1 0scenes 1 2 1 2the 0 2 0 1twists 1 1 1 1was 0 2 0 1worst 0 1 0 1
Figure 6.3 An example of binarization for the binary naive Bayes algorithm.
A second important addition commonly made when doing text classification forsentiment is to deal with negation. Consider the difference between I really like thismovie (positive) and I didn’t like this movie (negative). The negation expressed bydidn’t completely alters the inferences we draw from the predicate like. Similarly,negation can modify a negative word to produce a positive review (don’t dismiss thisfilm, doesn’t let us get bored).
A very simple baseline that is commonly used in sentiment to deal with negationis during text normalization to prepend the prefix NOT to every word after a tokenof logical negation (n’t, not, no, never) until the next punctuation mark. Thus thephrase
8 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION
P(+)P(S|+) =25⇥ 1⇥1⇥2
293 = 3.2⇥10�5
The model thus predicts the class negative for the test sentence.
6.4 Optimizing for Sentiment Analysis
While standard naive Bayes text classification can work well for sentiment analysis,some small changes are generally employed that improve performance.
First, for sentiment classification and a number of other text classification tasks,whether a word occurs or not seems to matter more than its frequency. Thus it oftenimproves performance to clip the word counts in each document at 1. This variantis called binary multinominal naive Bayes or binary NB. The variant uses thebinary NB
same Eq. 6.10 except that for each document we remove all duplicate words beforeconcatenating them into the single big document. Fig. 6.3 shows an example inwhich a set of four documents (shortened and text-normalized for this example) areremapped to binary, with the modified counts shown in the table on the right. Theexample is worked without add-1 smoothing to make the differences clearer. Notethat the results counts need not be 1; the word great has a count of 2 even for BinaryNB, because it appears in multiple documents.
Four original documents:� it was pathetic the worst part was the
boxing scenes� no plot twists or great scenes+ and satire and great plot twists+ great scenes great film
After per-document binarization:� it was pathetic the worst part boxing
scenes� no plot twists or great scenes+ and satire great plot twists+ great scenes film
NB BinaryCounts Counts+ � + �
and 2 0 1 0boxing 0 1 0 1film 1 0 1 0great 3 1 2 1it 0 1 0 1no 0 1 0 1or 0 1 0 1part 0 1 0 1pathetic 0 1 0 1plot 1 1 1 1satire 1 0 1 0scenes 1 2 1 2the 0 2 0 1twists 1 1 1 1was 0 2 0 1worst 0 1 0 1
Figure 6.3 An example of binarization for the binary naive Bayes algorithm.
A second important addition commonly made when doing text classification forsentiment is to deal with negation. Consider the difference between I really like thismovie (positive) and I didn’t like this movie (negative). The negation expressed bydidn’t completely alters the inferences we draw from the predicate like. Similarly,negation can modify a negative word to produce a positive review (don’t dismiss thisfilm, doesn’t let us get bored).
A very simple baseline that is commonly used in sentiment to deal with negationis during text normalization to prepend the prefix NOT to every word after a tokenof logical negation (n’t, not, no, never) until the next punctuation mark. Thus thephrase
DanJurafsky
BinaryNB
• Binaryworksbetterthanfullwordcountsforsentimentclassification
30
B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.Wang,Sida,andChristopherD.Manning.2012."Baselinesandbigrams:Simple,goodsentimentandtopicclassification."ProceedingsofACL,90-94.
DanJurafsky
Cross-Validation
• Breakupdatainto5folds• (Equalpositiveandnegativeinsideeachfold?)
• Foreachfold• Choosethefoldasatemporarytestset
• Trainon4folds,computeperformanceonthetestfold
• Reportaverageperformanceofthe5runs
TrainingTest
Test
Test
Test
Test
Training
Training Training
Training
Training
Iteration
1
2
3
4
5
DanJurafsky
OtherissuesinClassification
• LogisticRegressionandSVMtendtodobetterthanNaïve Bayes
32
DanJurafsky Problems:Whatmakesreviewshardtoclassify?
• Subtlety:• PerfumereviewinPerfumes:theGuide:• “Ifyouarereadingthisbecauseitisyourdarlingfragrance,pleasewearitathomeexclusively,andtapethewindowsshut.”
• DorothyParkeronKatherineHepburn• “SherunsthegamutofemotionsfromAtoB”
33
DanJurafsky
ThwartedExpectationsandOrderingEffects
• “Thisfilmshouldbebrilliant.Itsoundslikeagreatplot,theactorsarefirstgrade,andthesupportingcastisgoodaswell,andStalloneisattemptingtodeliveragoodperformance.However,itcan’tholdup.”
• WellasusualKeanuReevesisnothingspecial,butsurprisingly,theverytalentedLaurenceFishbourne isnotsogoodeither,Iwassurprised.
34
SentimentAnalysis
ABaselineAlgorithm
SentimentAnalysis
SentimentLexicons
DanJurafsky
TheGeneralInquirer
• Homepage:http://www.wjh.harvard.edu/~inquirer• ListofCategories: http://www.wjh.harvard.edu/~inquirer/homecat.htm
• Spreadsheet:http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls• Categories:
• Positiv (1915words)andNegativ (2291words)• Strongvs Weak,Activevs Passive,OverstatedversusUnderstated• Pleasure,Pain,Virtue,Vice,Motivation,CognitiveOrientation,etc
• FreeforResearchUse
PhilipJ.Stone,DexterCDunphy,MarshallS.Smith,DanielM.Ogilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress
DanJurafsky
LIWC(LinguisticInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX
• Homepage:http://www.liwc.net/• 2300words,>70classes• AffectiveProcesses
• negativeemotion(bad,weird,hate,problem,tough)• positiveemotion(love,nice,sweet)
• CognitiveProcesses• Tentative(maybe,perhaps,guess),Inhibition(block,constraint)
• Pronouns,Negation(no,never),Quantifiers(few,many)• $30or$90fee
DanJurafsky
MPQASubjectivityCuesLexicon
• Homepage:http://mpqa.cs.pitt.edu/lexicons/• 6885wordsfrom8221lemmas
• 2718positive• 4912negative
• Eachwordannotatedforintensity(strong,weak)• GNUGPL39
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
DanJurafsky
BingLiuOpinionLexicon
• BingLiu'sPageonOpinionMining• http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
• 6786words• 2006positive• 4783negative
40
Minqing HuandBingLiu.MiningandSummarizingCustomerReviews.ACMSIGKDD-2004.
SentimentAnalysis
SentimentLexicons
SentimentAnalysis
LearningSentimentLexicons
DanJurafsky
Semi-supervisedlearningoflexicons
• Whattodofordomainswhereyoudon'thavealexicon?
• Learnalexicon!• Useasmallamountofinformation
• Afewlabeledexamples• Afewhand-builtpatterns
• Tobootstrapalexicon43
DanJurafsky
Semi-supervisedlearningoflexicons
18.2 • SEMI-SUPERVISED INDUCTION OF SENTIMENT LEXICONS 3
The General Inquirer is a freely available web resource with lexicons of 1915 posi-tive words and 2291 negative words (and also includes other lexicons we’ll discussin the next section).
The MPQA Subjectivity lexicon (Wilson et al., 2005) has 2718 positive and4912 negative words drawn from a combination of sources, including the GeneralInquirer lists, the output of the Hatzivassiloglou and McKeown (1997) system de-scribed below, and a bootstrapped list of subjective words and phrases (Riloff andWiebe, 2003) that was then hand-labeled for sentiment. Each phrase in the lexiconis also labeled for reliability (strongly subjective or weakly subjective). The po-larity lexicon of (Hu and Liu, 2004) gives 2006 positive and 4783 negative words,drawn from product reviews, labeled using a bootstrapping method from WordNetdescribed in the next section.
Positive admire, amazing, assure, celebration, charm, eager, enthusiastic, excel-lent, fancy, fantastic, frolic, graceful, happy, joy, luck, majesty, mercy,nice, patience, perfect, proud, rejoice, relief, respect, satisfactorily, sen-sational, super, terrific, thank, vivid, wise, wonderful, zest
Negative abominable, anger, anxious, bad, catastrophe, cheap, complaint, conde-scending, deceit, defective, disappointment, embarrass, fake, fear, filthy,fool, guilt, hate, idiot, inflict, lazy, miserable, mourn, nervous, objection,pest, plot, reject, scream, silly, terrible, unfriendly, vile, wicked
Figure 18.2 Some samples of words with consistent sentiment across three sentiment lexi-cons: the General Inquirer (Stone et al., 1966), the MPQA Subjectivity lexicon (Wilson et al.,2005), and the polarity lexicon of Hu and Liu (2004).
18.2 Semi-supervised induction of sentiment lexicons
Some affective lexicons are built by having humans assign ratings to words; thiswas the technique for building the General Inquirer starting in the 1960s (Stoneet al., 1966), and for modern lexicons based on crowd-sourcing to be described inSection 18.5.1. But one of the most powerful ways to learn lexicons is to use semi-supervised learning.
In this section we introduce three methods for semi-supervised learning that areimportant in sentiment lexicon extraction. The three methods all share the sameintuitive algorithm which is sketched in Fig. 18.3.
function BUILDSENTIMENTLEXICON(posseeds,negseeds) returns poslex,neglex
poslex posseedsneglex negseedsUntil done
poslex poslex + FINDSIMILARWORDS(poslex)neglex neglex + FINDSIMILARWORDS(neglex)
poslex,neglex POSTPROCESS(poslex,neglex)
Figure 18.3 Schematic for semi-supervised sentiment lexicon induction. Different algo-rithms differ in the how words of similar polarity are found, in the stopping criterion, and inthe post-processing.44
DanJurafsky
Hatzivassiloglou andMcKeown intuitionforidentifyingwordpolarity
• Adjectivesconjoinedby“and”havesamepolarity• Fairand legitimate,corruptand brutal• *fairand brutal,*corruptand legitimate
• Adjectivesconjoinedby“but”donot• fairbutbrutal
45
Vasileios Hatzivassiloglou andKathleenR.McKeown.1997.PredictingtheSemanticOrientationofAdjectives.ACL,174–181
DanJurafsky
Hatzivassiloglou &McKeown 1997Step1
• Labelseedsetof1336adjectives(all>20in21millionwordWSJcorpus)
• 657positive• adequatecentralcleverfamousintelligentremarkablereputedsensitiveslenderthriving…
• 679negative• contagiousdrunkenignorantlankylistlessprimitivestridenttroublesomeunresolvedunsuspecting…
46
DanJurafsky
Hatzivassiloglou &McKeown 1997Step2
• Expandseedsettoconjoinedadjectives
47
nice, helpful
nice, classy
DanJurafsky
Hatzivassiloglou &McKeown 1997Step3
• Supervisedclassifierassigns“polaritysimilarity”toeachwordpair,resultingingraph:
48
classy
nice
helpful
fair
brutal
irrationalcorrupt
DanJurafsky
Hatzivassiloglou &McKeown 1997Step4
• Clusteringforpartitioningthegraphintotwo
49
classy
nice
helpful
fair
brutal
irrationalcorrupt
+ -
DanJurafsky
Outputpolaritylexicon
• Positive• bolddecisivedisturbinggenerousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…
• Negative• ambiguouscautiouscynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…
50
DanJurafsky
Outputpolaritylexicon
• Positive• bolddecisivedisturbing generousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrange talentedvigorouswitty…
• Negative• ambiguouscautious cynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspoken pleasant recklessriskyselfishtediousunsupportedvulnerablewasteful…
51
DanJurafsky
Turney Algorithm
1. Extractaphrasallexiconfromreviews2. Learnpolarityofeachphrase3. Rateareviewbytheaveragepolarityofitsphrases
52
Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
DanJurafsky
Extracttwo-wordphraseswithadjectives
FirstWord SecondWord ThirdWord (notextracted)
Adj Noun anythingAdverb Adj not nounAdj Adj not nounNoun Adj not nounAdverb Verb anything53
DanJurafsky
Howtomeasurepolarityofaphrase?
Positivephrasesco-occurmorewith“excellent”Negativephrasesco-occurmorewith“poor”
Buthowtomeasureco-occurrence?
54
DanJurafsky
Pointwise MutualInformation
• Mutualinformationbetween2randomvariablesXandY
• Pointwise mutualinformation:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
I(X,Y ) = P(x, y)y∑
x∑ log2
P(x,y)P(x)P(y)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
DanJurafsky
Pointwise MutualInformation
• Pointwise mutualinformation:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
• PMIbetweentwowords:• Howmuchmoredotwowordsco-occurthaniftheywereindependent?
PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
DanJurafsky
HowtoEstimatePointwise MutualInformation
Querysearchengine• P(word)estimatedbyhits(word)/N• P(word1,word2)byhits(word1 NEAR word2)/N
(Caveat:MorecorrectlythebigramdenominatorshouldbekN,becausethereareatotalofNconsecutivebigrams(word1,word2),butkN bigramsthatarekwordsapart,butwejustuseNontherestofthisslideandthenext.)PMI(word1,word2 ) = log2
1Nhits(word1 NEAR word2)
1Nhits(word1) 1
Nhits(word2)
DanJurafsky
Doesphraseappearmorewith“poor”or“excellent”?
58
Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")
= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!
"#
$
%&
= log2hits(phrase NEAR "excellent")
hits(phrase)hits("excellent")hits(phrase)hits("poor")
hits(phrase NEAR "poor")
= log2
1N hits(phrase NEAR "excellent")1N hits(phrase) 1
N hits("excellent")− log2
1N hits(phrase NEAR "poor")1N hits(phrase) 1
N hits("poor")
DanJurafsky
Learnedphrases(reviewsofabank)
59
Phrase Polarity
onlineexperience 2.3veryhandy 1.4lowfees 0.3inconvenientlylocated -1.5otherproblems -2.8unethicalpractices -8.5
DanJurafsky
SummaryonLearningLexicons
• Why:• Learnalexiconthatisspecifictoadomain• Learnalexiconwithmorewords(morerobust)thanoff-the-shelf
• Intuition• Startwithaseedsetofwords(‘good’,‘poor’)• Findotherwordsthathavesimilarpolarity:• Using“and”and“but”• Usingwordsthatoccurnearbyinthesamedocument• Addthemtolexicon
DanJurafsky
Modernversionsoflexiconlearning
(Roughlythesamealgorithm)
• Startwithaseedsetofwords• Expandtowordsthathave"similarmeaning"
• Measuresimilarityusingembeddings likeword2vec;deeplearningbasedvectormodelsofmeaning
• We'llcovertheseinweek7,VectorSemantics!
61
SentimentAnalysis
LearningSentimentLexicons
SentimentAnalysis
OtherSentimentTasks
DanJurafsky
Findingsentimentofasentence
• Importantforfindingaspectsorattributes• Targetofsentiment
64
The food was great but the service was awfulThe food was great but the service was awful
DanJurafsky
Findingaspect/attribute/targetofsentiment
• Frequentphrases+rules• Findallhighlyfrequentphrasesacrossreviews(“fish tacos”)• Filterbyruleslike“occursrightaftersentimentword”• “…great fish tacos”meansfish tacos alikelyaspect
Casino casino,buffet,pool,resort,bedsChildren’s Barber haircut,job,experience,kidsGreekRestaurant food,wine,service,appetizer,lambDepartmentStore selection,department,sales,shop,clothing
M.HuandB.Liu.2004.Miningandsummarizingcustomerreviews.InProceedingsofKDD.S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop.
DanJurafsky
Findingaspect/attribute/targetofsentiment
• Theaspectnamemaynotbeinthesentence• Forrestaurants/hotels,aspectsarewell-understood• Supervisedclassification
• Hand-labelasmallcorpusofrestaurantreviewsentenceswithaspect• food,décor,service,value,NONE
• Trainaclassifiertoassignanaspecttoasentence• “Giventhissentence,istheaspectfood,décor,service,value,or NONE”
66
DanJurafsky
Puttingitalltogether:Findingsentimentforaspects
67
ReviewsFinalSummary
Sentences&Phrases
Sentences&Phrases
Sentences&Phrases
TextExtractor
SentimentClassifier
AspectExtractor
Aggregator
S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop
DanJurafsky
ResultsofBlair-Goldensohn etal.methodRooms (3/5stars,41comments)
(+) Theroomwascleanandeverythingworkedfine– eventhewaterpressure...
(+)Wewentbecauseofthefreeroomandwaspleasantlypleased...
(-)…theworsthotelIhadeverstayedat...Service (3/5stars,31comments)
(+)Uponcheckingoutanothercouplewascheckingearlyduetoaproblem...
(+)Everysinglehotelstaffmembertreatedusgreatandansweredevery...
(-)ThefoodiscoldandtheservicegivesnewmeaningtoSLOW.
Dining (3/5stars,18comments)(+)ourfavoriteplacetostayinbiloxi.the foodisgreatalsotheservice...(+)OfferoffreebuffetforjoiningthePlay
DanJurafsky
SummaryonSentiment
• Generallymodeledasclassificationorregressiontask• predictabinaryorordinallabel
• Features:• Negationisimportant• Usingallwords(innaivebayes)workswellforsometasks• Findingsubsetsofwordsmayhelpinothertasks• Hand-builtpolaritylexicons• Useseedsandsemi-supervisedlearningtoinducelexicons
SentimentAnalysis
Extra
DanJurafsky
AnalyzingthepolarityofeachwordinIMDB
• Howlikelyiseachwordtoappearineachsentimentclass?• Count(“bad”)in1-star,2-star,3-star,etc.• Butcan’tuserawcounts:• Instead,likelihood:
• Makethemcomparablebetweenwords• Scaledlikelihood:
Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
P(w | c) = f (w,c)f (w,c)
w∈c∑
P(w | c)P(w)
DanJurafsky
AnalyzingthepolarityofeachwordinIMDBPotts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
Overview Data Methods Categorization Scale induction Looking ahead
Example: attenuators
IMDB – 53,775 tokens
Category
-0.50
-0.39
-0.28
-0.17
-0.06
0.06
0.17
0.28
0.39
0.50
0.050.09
0.15
Cat = 0.33 (p = 0.004)Cat^2 = -4.02 (p < 0.001)
OpenTable – 3,890 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.08
0.38
Cat = 0.11 (p = 0.707)Cat^2 = -6.2 (p = 0.014)
Goodreads – 3,424 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.08
0.19
0.36
Cat = -0.55 (p = 0.128)Cat^2 = -5.04 (p = 0.016)
Amazon/Tripadvisor – 2,060 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.12
0.28
Cat = 0.42 (p = 0.207)Cat^2 = -2.74 (p = 0.05)
somewhat/r
IMDB – 33,515 tokens
Category
-0.50
-0.39
-0.28
-0.17
-0.06
0.06
0.17
0.28
0.39
0.50
0.04
0.09
0.17
Cat = -0.13 (p = 0.284)Cat^2 = -5.37 (p < 0.001)
OpenTable – 2,829 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.08
0.31
Cat = 0.2 (p = 0.265)Cat^2 = -4.16 (p = 0.007)
Goodreads – 1,806 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.05
0.12
0.18
0.35
Cat = -0.87 (p = 0.016)Cat^2 = -5.74 (p = 0.004)
Amazon/Tripadvisor – 2,158 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.11
0.29
Cat = 0.54 (p = 0.183)Cat^2 = -3.32 (p = 0.045)
fairly/r
IMDB – 176,264 tokens
Category
-0.50
-0.39
-0.28
-0.17
-0.06
0.06
0.17
0.28
0.39
0.50
0.050.090.13
Cat = -0.43 (p < 0.001)Cat^2 = -3.6 (p < 0.001)
OpenTable – 8,982 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.08
0.140.19
0.32
Cat = -0.64 (p = 0.035)Cat^2 = -4.47 (p = 0.007)
Goodreads – 11,895 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.07
0.15
0.34
Cat = -0.71 (p = 0.072)Cat^2 = -4.59 (p = 0.018)
Amazon/Tripadvisor – 5,980 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.15
0.28
Cat = 0.26 (p = 0.496)Cat^2 = -2.23 (p = 0.131)
pretty/r
“Potts&diagrams” Potts,&Christopher.& 2011.&NSF&workshop&on&restructuring&adjectives.
good
great
excellent
disappointing
bad
terrible
totally
absolutely
utterly
somewhat
fairly
pretty
Positive scalars Negative scalars Emphatics Attenuators
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
Overview Data Methods Categorization Scale induction Looking ahead
Example: attenuators
IMDB – 53,775 tokens
Category
-0.50
-0.39
-0.28
-0.17
-0.06
0.06
0.17
0.28
0.39
0.50
0.050.09
0.15
Cat = 0.33 (p = 0.004)Cat^2 = -4.02 (p < 0.001)
OpenTable – 3,890 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.08
0.38
Cat = 0.11 (p = 0.707)Cat^2 = -6.2 (p = 0.014)
Goodreads – 3,424 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.08
0.19
0.36
Cat = -0.55 (p = 0.128)Cat^2 = -5.04 (p = 0.016)
Amazon/Tripadvisor – 2,060 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.12
0.28
Cat = 0.42 (p = 0.207)Cat^2 = -2.74 (p = 0.05)
somewhat/r
IMDB – 33,515 tokens
Category
-0.50
-0.39
-0.28
-0.17
-0.06
0.06
0.17
0.28
0.39
0.50
0.04
0.09
0.17
Cat = -0.13 (p = 0.284)Cat^2 = -5.37 (p < 0.001)
OpenTable – 2,829 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.08
0.31
Cat = 0.2 (p = 0.265)Cat^2 = -4.16 (p = 0.007)
Goodreads – 1,806 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.05
0.12
0.18
0.35
Cat = -0.87 (p = 0.016)Cat^2 = -5.74 (p = 0.004)
Amazon/Tripadvisor – 2,158 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.11
0.29
Cat = 0.54 (p = 0.183)Cat^2 = -3.32 (p = 0.045)
fairly/r
IMDB – 176,264 tokens
Category
-0.50
-0.39
-0.28
-0.17
-0.06
0.06
0.17
0.28
0.39
0.50
0.050.090.13
Cat = -0.43 (p < 0.001)Cat^2 = -3.6 (p < 0.001)
OpenTable – 8,982 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.08
0.140.19
0.32
Cat = -0.64 (p = 0.035)Cat^2 = -4.47 (p = 0.007)
Goodreads – 11,895 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.07
0.15
0.34
Cat = -0.71 (p = 0.072)Cat^2 = -4.59 (p = 0.018)
Amazon/Tripadvisor – 5,980 tokens
Category
-0.50
-0.25
0.00
0.25
0.50
0.15
0.28
Cat = 0.26 (p = 0.496)Cat^2 = -2.23 (p = 0.131)
pretty/r
“Potts&diagrams” Potts,&Christopher.& 2011.&NSF&workshop&on&restructuring&adjectives.
good
great
excellent
disappointing
bad
terrible
totally
absolutely
utterly
somewhat
fairly
pretty
Positive scalars Negative scalars Emphatics Attenuators
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
1 2 3 4 5 6 7 8 9 10rating
DanJurafsky
Othersentimentfeature:Logicalnegation
• Islogicalnegation(no,not)associatedwithnegativesentiment?
• Pottsexperiment:• Countnegation(not,n’t,no,never)inonlinereviews• Regressagainstthereviewrating
Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
DanJurafsky Potts2011Results:Morenegationinnegativesentiment
a
Scaled
likelihoo
dP(w|c)/P(w)