great food, lousy service topic modeling for sentiment analysis in sparse reviews robin melnick...
TRANSCRIPT
![Page 1: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/1.jpg)
Great Food, Lousy Service
Topic Modeling for Sentiment Analysis in Sparse Reviews
Robin [email protected]
![Page 2: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/2.jpg)
OpenTable.com
![Page 3: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/3.jpg)
Short
Characters Words
![Page 4: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/4.jpg)
Sparse
“An unexpected combination of Left-Bank Paris and Lower Manhattan in Omaha.
Divine. Inspirational and a great value.”
• Food?• Ambiance?• Service?• Noise?
![Page 5: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/5.jpg)
Skewed
![Page 6: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/6.jpg)
Correlations
![Page 7: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/7.jpg)
SVM + Features, Features, Features!
tokenize punctuation "white list" (only use sentiment words) id, neutralize proper nouns remove stop words strip numbers POS tagging, ADJ only contraction splitting POS tagging, add ADV lower casing Brill tagger unigram (Bag of Words) sentiment "white list" (Harvard lexicon) bigram count of sentiment words (pos/neg) trigram balanced training set mixed n-grams binary accuracy ignore stop words sub-topic classifiers, hand list stemming WordNet topic list expansion negation processing topic-filtered n-grams expanded negation processing topic-word proximity filtering large training set size strict entropy modeling varying dictionary size frequency-weighted entropy modeling SVM scaling
• 30+ preprocessing and SVM classification features,• ~50 configurations
![Page 8: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/8.jpg)
Key Features
• Stemming• Porter 1980 via NLTK• <fast>, <faster>, <fastest> <fast>
• Negation processing • (enhanced approach from Pang et al. 2002)• “Not a great experience.” NOT_great• “They never disappoint!” NOT_disappoint
• Net sentiment count• pos/neg lexicon (Harvard General Inquirer)• running +/- count• “Incredible(+) food, but our server was rude(-).” (0)
![Page 9: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/9.jpg)
Results (so far)
• Trained on 10,000 reviews• Tested on ~80,000 reviews
• Accuracy• Baseline: 50.0%• Intermediate model: 56.6% (1.13x)
• abs( average scoring delta ): 0.56
![Page 10: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/10.jpg)
Topic Modeling
Hand-seeded topic-word list expanded via WordNet SynSets
1. sub-topic classifiers2. topic-filtered n-grams• <soupFOOD was fantasticADJ>
• <fantasticADJ soupFOOD was>
3. topic-word proximity filtering• both above <fantasticADJ/FOOD>.
Results:Food Ambiance Service Noise
1. 39.15% 47.26% 53.70% 48.43%3. 40.05% 47.88% 54.92% 50.35%
1.02x 1.01x 1.02x 1.03x
![Page 11: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/11.jpg)
Word-Rating Distributions
“worst” “mediocre” “decent”
“solid” “exceeded”
![Page 12: Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu](https://reader037.vdocument.in/reader037/viewer/2022110320/56649cb15503460f94975d88/html5/thumbnails/12.jpg)
Frequency-Weighted Entropy Model
• Accuracy• Baseline: 50.0%• Intermediate model: 56.6%• Best (entropy) model: 58.6% (1.17x)
• abs( average scoring delta ): 0.56 0.52