a survey of sentiment analysis
DESCRIPTION
Sentiment Analysis refers to a set of natural language processing technologies used to extract subjective information from a body of text. While sentiment analysis offers significant insight into the public opinion, implementations still exhibit great potential for development, thus making it a nascent field of research. This survey provides a brief overview of the technologies commonly used to approach problems in sentiment analysis, taking particular challenges imposed by user-generated content in “social-media” into account. This survey will seek to demonstrate which technologies are promising in the field in general and in the realm of user generated content in particular.TRANSCRIPT
![Page 1: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/1.jpg)
A Survey of Sentiment AnalysisBlockseminar “Intelligente Softwaresysteme” 2013/14 TU Berlin7 Feb 2014 Moritz Platt
![Page 2: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/2.jpg)
Agenda
Introduction
▼
Algorithms
▼
Benchmarks
▼
Outlook
Intelligente Softwaresysteme 2013/14 2
![Page 3: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/3.jpg)
Sentiment Analysis is an NLP Task
• Sentiment Analysis = Opinion Mining = Subjectivity Analysis
•Extract opinions on objects from text
•Working on natural language corpora•Research problem with a lot of applications•Relatively new research area, rapidly developing field•Related fields:
•Natural Language Processing• Social Media Analysis• Text Mining•Data Mining
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 3
![Page 4: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/4.jpg)
Accessing Opinions–Now and Then
Dot-Com Era and Beyond•Huge stream of opinionated text
•1.2 million daily blog posts [Zabin2008]
•45 million daily “status up-dates” on Facebook [Thomas2010]
•Often featuring opinions towards products or persons
Pre Dot-Com Era• Extensive measures
• Surveys•Opinion polls• Focus groups
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 4
![Page 5: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/5.jpg)
Where are today’sopionated texts coming from?
Social Networks BlogsReviews
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 5
![Page 6: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/6.jpg)
The Relationship Between Opinion Holders and Objects
• Edges between opinion holders and features represent opinions• The time aspect is usually ommited
John
f The voice quality of a particularmodel of a cellular phone
Jack
James
Opinion Holders
“Voice quality is wonderful.”
“Voice sounds terrible.”
“Speech quality is average.”
Features
o A particular modelof a cellular phone
ObjectsOpinionated Text Sentiment Value
f
f
f
f
PositiveNegative
Neutral
•Consider a set of product reviews for a particular model of a cellular phone
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 6
![Page 7: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/7.jpg)
The Aspects of Opinions
StructureofanopinionasdefinedbyLiu[Liu2010]:
(oj, fjk, soijkl, hi, tl)•Object oj
The target of an opinion (e.g. product, person, event, organisation, topic)• Feature fjk
Components/Attributes of an object (e.g. battery life, camera resolution)• Sentiment Value soijkl
The orientantion of an opinion from a set of possible choices (e.g. positive, negative, neutral)
•Opinion Holder hi The person expressing the opinion
• Time tl The time at which the opinion is expressed
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 7
![Page 8: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/8.jpg)
Algorithms
Intelligente Softwaresysteme 2013/14 8
![Page 9: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/9.jpg)
Approaching Sentiments Algorithmically
Unsupervised Methods
•No training data•Cross-domain applications
Supervised Methods
•Manually labelled training data•Usually superior to unsupervised
approaches
•Point-Wise Mutual Information
•Naïve Bayes Classification•Maximum Entropy Classification• Suppor Vector Machines
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 9
![Page 10: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/10.jpg)
PMI-IR
•PMI: Point-wise mutual information• IR: Information retrieval
• Introduced 2002 as an unsupervised learning algorithm for classifying re-views [Turney2002]
•Based on the concept of PMI [Church1990]
•Measures the probability of the co-occurrence of words
PMI (word 1 ,w ord 2 )= log2p(word 1&word 2 )p(word 1 )p(word 2 )
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 10
![Page 11: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/11.jpg)
PMI-IR
• Turney used the words poor and excellent as seeds for the algorithm
• SO is the sentiment orientation value•Positive SO-value for phrases more associated with excellent•Negative SO-value for phrases more associated with poor
• Improvement of results through IR component• Turney used AltaVista• uses the NEAR operator• h(query) is the number of hits returned given the query
SO(phrase )= PMI(phrase, “ excellent ”) PMI(phrase, “ poor”)
SO(phrase )= log2h(phrase NEAR“ excellent ”)h(“ poor”)h(phrase NEAR“ poor”)h(“ excellent ”)
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 11
![Page 12: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/12.jpg)
NaïveBayesClassification
•Based on Bayes rule [Bayes1763]
• Simply trained, probalistic, effective• “Bag of words” of an input document d• Fixed set of classes C, e.g. C = {positive, negative}
• d can be reduced by omitting irrelevant words
All Words [Jurafsky2013]
I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.!
Opinionated Words [Jurafsky2013]
x love xxxxxxxxxxxxxxxx sweet xxxxxxx satirical xxxxxxxxxx xxxxxxxxxxx great xxxxxxx xxxxxxxxxxxxxxxxxxx fun xxxx xxxxxxxxxxxxx whimsical xxxx romantic xxxx laughing xxxxxxxxx xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx recommend xxxxx xxxx xxxxxxxxxxxxxxxxxxxxxxxx several xxxxxxxxxxxxxxxxx xxxxx happy xxxxxxxxx again xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 12
![Page 13: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/13.jpg)
Naïve Bayes at work
1. Estimate P(c) of each class c by dividing the number of words in documents in c by the total number of words in the corpus2. Estimate the P(w|c) for all words w and classes c 3. The score for a document d to be in class c is
4. The most likely class for a document is the one with the highest score[Potts2011]
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 13
![Page 14: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/14.jpg)
MaximumEntropyClassification
Ignoranceispreferabletoerror,andheislessremotefromthetruthwhobe-lievesnothingthanhewhobelieveswhatiswrong. — Thomas Jefferson
• Find weights for the features that maximize the likelihood of the training data
•Add constraints based on training data•More constraints = less entropy = distribution is closer to data
•More difficult to implement than Naïve Bayes[Potts2011]
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 14
![Page 15: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/15.jpg)
Support Vector Machines
•Most intuitive for two-class, separable training data sets
• Find a vector to seperate data sets maximizing the margin (A vs B)
• The margin is limited by sup-port vectors
•Applicable to more complicated problems too• n-class space• inseperable training data
through transformation in higher dimensions
y
x
A
B
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 15
![Page 16: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/16.jpg)
Benchmarks
Intelligente Softwaresysteme 2013/14 16
![Page 17: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/17.jpg)
Benchmarking Sentiment Analysis
•Benchmarking NB and ME with in-domain testing
[Potts2011]
•Binary classification•6.000 restaurant re-
views
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 17
![Page 18: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/18.jpg)
Benchmarking Sentiment Analysis
•Benchmarking NB and ME with testing on a different domain
[Potts2011]
Introduction > Algorithms > Benchmarks > Outlook
•Binary classification• Trained on 6.000 res-
taurant reviews• Tested on 6.000 prod-
uct reviews
Intelligente Softwaresysteme 2013/14 18
![Page 19: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/19.jpg)
Outlook
Intelligente Softwaresysteme 2013/14 19
![Page 20: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/20.jpg)
Opinionated Data in the Wild
•Works well under laboratory conditions•Proper spelling•Highly opinionated•Pre-defined object
• Still common NLP problems remain•Named entity recognition•Context specific meaning• Language Ambiguity
•Benchmarking corpora do not reflect real-world data quality
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 20
![Page 21: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/21.jpg)
Opinionated Data in the Wild
• Social media data•Highly relevant•Huge corpus•Constantly growing
• Very noisy•Questionable text quality
• Spelling•Grammar
• Spam•Unclear context• Figurative speech• Slang• Irony
Warren Scott M.your mxf format is a joke. DO NOT BUY CANON
Like
•
Comment 21 January at 18:31
Leon H.Why battery 6L in my Canon sx280 have pretty low life
Like
•
Comment 11 January at 10:58
Phil D.Youse guys did a solid on my wife's TI3- warranty expired lastmonth, but did the job good! Thanks CanonLike
•
Comment 11 January at 04:39
Cole J.Got a canon gl1I love it, but a little fuzzyLike
•
Comment 28 January 2010
Authentic status updates from https://www.facebook.com/pages/Canon-Cameras
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 21
![Page 22: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/22.jpg)
Conclusions / Future Work
•Development of algorithms is on the right track• Evolvement beyond binary classification•Algorithms will become more robust on less homogenous sources
• Industry aims to apply algorithms to noisy data
Introduction > Algorithms > Benchmarks > Outlook
Intelligente Softwaresysteme 2013/14 22
![Page 23: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/23.jpg)
Appendix
Intelligente Softwaresysteme 2013/14 23
![Page 24: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/24.jpg)
References
article(Bayes1763)Bayes, T.An essay towards solving a problem in the doctrine of chancesPhil. Trans. of the Royal Soc. of London, 1763, Vol. 53, pp. 370-418
article(Church1990)Church, K.W. & Hanks, P.Word Association Norms, Mutual Information, and LexicographyComput. Linguist., MIT Press, 1990, Vol. 16(1), pp. 22-29
misc(Jurafsky2013)Dan Jurafsky, E.NaïveBayesandTextClassification2013
inproceedings(Liu2010)Liu, B.Sentiment analysis and subjectivityHandbook of Natural Language Processing, Second Edition. Taylor and Francis Group, Boca2010
misc(Potts2011)Potts, C.SentimentSymposiumTutorial:Classifiershttp://sentiment.christopherpotts.net/classifiers.html2011
Intelligente Softwaresysteme 2013/14 24
![Page 25: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/25.jpg)
book(Thomas2010)Thomas, A. & Applegate, J.PayAttention!:HowtoListen,Respond,andProfitfromCustomerFeedbackWiley, 2010
inproceedings(Turney2002)Turney, P.D.Thumbsuporthumbsdown?SemanticorientationappliedtounsupervisedclassificationofreviewsProceedings 40th Annual Meeting of the ACL (2002)2002, pp. 417-424
misc(Zabin2008)Zabin, J. & Jefferies, A.Social Media Monitoring and Analysis: Generating Consumer Insights from Online ConversationAberdeen Group Benchmark Report, 2008
Intelligente Softwaresysteme 2013/14 25
![Page 26: A Survey of Sentiment Analysis](https://reader033.vdocument.in/reader033/viewer/2022051400/54c66f5d4a795913618b460e/html5/thumbnails/26.jpg)
Picture Credit
IconsPage 8:Arrow by Jamison Wieser from The Noun Project
PhotographyPage 1: “Thumbs up on diving down” by JamesHuckaby is licensed under a Creative Commons Attribution-NonCommercial-No-Derivs2.0GenericLicense.Basedonaworkathttp://www.flickr.com/photos/raveller/1117899371/.Toviewacopyofthis license, visit http://creativecommons.org/licenses/by-nc-nd/2.0/legalcode.
Page 3: “Coventry Solihull Warwickshire Sub-Regional Planning Study Questionnaire” by TheJRJamesArchive is licensed under aCreativeCommonsAttribution-NonCommercial2.0GenericLicense.Basedonaworkathttp://www.flickr.com/photos/jrjamesarchive/9371523446/. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.0/legal-code.
Page 14: “Svm intro.svg” by FabianBürgeris licensed under a Creative Commons Attribution 3.0 License. Based on a work at http://commons.wikimedia.org/wiki/File:Svm_intro.svg. To view a copy of this license, visit http://creativecommons.org/li-censes/by/3.0/legalcode.
Intelligente Softwaresysteme 2013/14 26