twitter sentiment analysis
TRANSCRIPT
The University of Texas at Dallas utdallas.edu
Airline Twitter Analysis
1
The University of Texas at Dallas utdallas.edu
What we wanted to do?
• Kaggle- Twitter Airlines Sentiments• Exploratory Analysisi. When do people tweet?ii. Which airlines gets the most tweets?iii. Which sentiments are dominant?iv. How these sentiments are distributed?• Text Analyticsi. Most frequently used wordsii. Most frequently used words when the sentiment is negative.iii. Most frequently used words when the sentiment is positive.iv. Tweet length vs Sentiment
2
The University of Texas at Dallas utdallas.edu
Cleansing of data
• Tweets Had “@airline name” at the beginning of every tweet
• 4 columns with hardly any data
• Null and missing values
• Co-Ordinates required - Geo coding
3
The University of Texas at Dallas utdallas.edu
When do people tweet?
• Most of the tweets have come in during the rush morning hours peaking at 9 am
4
0
200
400
600
800
1000
1200
0 5 10 15 20 25
Nu
mb
er
of
Twe
ets
Hour
Number of Tweets every hour
The University of Texas at Dallas utdallas.edu
How are the tweets & sentiments distributed?
• United Airlines, American and US Airways receive most of the tweets.
• Most of the tweets are negative as expected.
• 63% of the tweets are negative.
5
The University of Texas at Dallas utdallas.edu
Distribution of sentiments for all the airlines
Sentiment frequency
Positive 0.1706621Neutral 0.2295947Negative 0.5997432
• The three airlines having maximum tweets are the ones having maximum negative tweets? Why?
6
The University of Texas at Dallas utdallas.edu
Why so many negative tweets?
7
The University of Texas at Dallas utdallas.edu
Word clouds to show frequency of words used in negative tweets
8
US Airways United Airlines American Airlines
The University of Texas at Dallas utdallas.edu
An outlier in the case of Delta Airlines
.
9
The University of Texas at Dallas utdallas.edu
Word cloud for all the positive tweets
10
The University of Texas at Dallas utdallas.edu
From which time zones are people tweeting ?
• Flights travel everywhere throughout the world.
• But we observed that most of the tweets originate from the Eastern Time zone(US & Canada).
11
The University of Texas at Dallas utdallas.edu
Association Analysis
• Association Analysis on words used in the tweet.
12
The University of Texas at Dallas utdallas.edu
Hierarchical clustering to determine association between words
13
The University of Texas at Dallas utdallas.edu
Cont’d
14
The University of Texas at Dallas utdallas.edu
Kmean clustering
15
The University of Texas at Dallas utdallas.edu
Cont’d
16
The University of Texas at Dallas utdallas.edu
Association between Tweet length and sentiment
• Longer the tweet, we observed they are likely to be negative in sentiment.
17
The University of Texas at Dallas utdallas.edu
Cont’d
18
The University of Texas at Dallas utdallas.edu
What else we tried doing?
• A predictive model
• Setbacks we faced during the process
• Work on SPSS
• Categorization
19
The University of Texas at Dallas utdallas.edu
Why this Analysis? Will it help in some way?
• Airline Industry – lives on customers.
• We get to know where we are doing good and where we are doing bad.
• Can be a basis for a predictive model when we associated tweet length with sentiment.
• Companies can get to know their competition.
• Improve the flight journey overall.
20
The University of Texas at Dallas utdallas.edu
References
• Wikipedia.com
• Kaggle.com
• www.clarabridge.com/text-analytics/
• https://sites.google.com/site/manabusakamoto/home/r.../r-tutorial-3
21