mike davies sentiment_analysis_presentation_backup
DESCRIPTION
My Michaelmas fourth year presentation on a CUED fourth year project: Sentiment Analysis.TRANSCRIPT
![Page 1: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/1.jpg)
Sentiment Analysis
1. Discover a niche network of Twitter users
2. Model their emotions on topics
3. Use feelings to more accurately predict a time series e.g. The stock market
e.g. Box office success
4. Are some [users/networks] more influential than others?
![Page 2: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/2.jpg)
This Talk
The Design Decision The Core Goals The 3 parts of the project:
1. Classifying the SENTIMENT of tweets
2. Building a NETWORK of twitter users
3. Finding a TIME SERIES of sentiment for each user
![Page 3: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/3.jpg)
Sentiment Analysis Used Already
Derwent Capital Markets - ”The twitter hedgefund”
£25m fund 10% of tweets predicts Dow Jones movement direction with
87.6% accuracy Returned 1.85% in its first month of trading Johan Bollen, Indiana University, used bag-of-
words approach
![Page 4: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/4.jpg)
Sentiment Analysis Used Already
Product reviews / ratings
![Page 5: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/5.jpg)
Sentiment Analysis Used Already
Social Media Analytics
![Page 6: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/6.jpg)
Design Decision
Many paragraphs of text (Product Reviews)
+ : Better accuracy of prediction
- : Less data overall
Huge amount of small quantities of text (Twitter)
+ : Opinions of greater number of people& at high enough frequency to model as a signal
- : Classification of opinion is v. poor
![Page 7: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/7.jpg)
2 Current Aims (will change later)
1. Project aims to be context
independent (i.e. Movies & products)
2. When context is given, use it to better classify tweets
![Page 8: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/8.jpg)
1: Sentiment Analysis of Tweets
Three-tier classification process:
tweet
spam not spam
objective subjective
positive negative
![Page 9: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/9.jpg)
Double-Back Propagation Algorithm ACL Journal, March 2011, MIT Press Opinion Word Extraction & Target Extraction 4 rules
”The phone has a good screen”
=> add ”good” to list of adjectives
=> add ”screen” to list of nouns Etc.
Great for rating features of a product
Not great for tweets
1: Sentiment Analysis of Tweets
![Page 10: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/10.jpg)
Twitter Part Of Speech (POS) tagger:
www.ark.cs.cmu.edu/TweetNLP/ Written in java Max Ent
1: Sentiment Analysis of Tweets
" ^Drive ^" ^, ,go Vand &watch Vit O! ,Fantastic Amovie N. ,
![Page 11: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/11.jpg)
Bootstrapped Tweet SA improver
IMDB Movie Review Corpora
Double-BackProp. Algo
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
SentimentAnalysis
Gives useful adjectives, nouns
![Page 12: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/12.jpg)
Collected my twitter friends, friends of friends, friends of friends of friends.
=> 115,896 users
2: Building a Network
![Page 13: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/13.jpg)
2: Building a Network
![Page 14: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/14.jpg)
Community detection: Paper 1: Near linear time algorithm for
detecting community structures on large scale networks
Paper 2: An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks Haizheng Zhang
2: Building a Network
![Page 15: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/15.jpg)
Like MapReduce Instead of ”map” and ”reduce” Map = 'Update':
modify overlapping sets of data Reduce = 'Sync': perform reductions in the
background while sync is running Label Propagation & LDA
2: Building a Network
![Page 16: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/16.jpg)
Will get time series from python to R using the rpy2 module
R has a great package ”quantmod” for importing financial market data.
Can also import other time seriesvery easily & many great libraries.
3: Time series prediction
![Page 17: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/17.jpg)
Built With
Python - For majority of code
Packages: numpy, scipy, matplotlib
networkx, graphviz, rpy2
django, twython, nltk R - For time series analysis Postgreql - SQL database Java - Twitter POS tagger C/C++ - GraphLab
![Page 18: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/18.jpg)
End Product
IMDB Movie Review Corpora
Double-BackProp. Algo
Tweet
Tweet
Tweet
Tweet
Tweet
SentimentAnalysis
![Page 19: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/19.jpg)
Thank You Mike Davies
Documented at www.m1ked.com
![Page 20: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/20.jpg)
Notes: Vowpal Wabbit LDA
Vowpal Wabbit is an open source library for fast online learning (mostly SGD) mainly developed by a guy at Yahoo.
Optimised for speed LDA uses clever tricks like vectorisation,
floating point representation to avoid using pow() and exp() functions.
![Page 21: Mike davies sentiment_analysis_presentation_backup](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bc644a7959c35a8b4591/html5/thumbnails/21.jpg)
Notes: Label Propagation
Label Propagation has been proven to be an effective semi-supervised learning approach in many applications. The key idea behind label propagation is to first construct a graph in which each node represents a data point and each edge is assigned a weight often computed as the similarity between data points, then propagate the class labels of labeled data to neighbors in the constructed graph in order to make predictions.