![Page 1: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/1.jpg)
Deciphering Social Media Messages for #GE2015
Dr. Giuseppe Di Fatta (SSE)
Associate Professor of Computer Science
Director MSc Advanced Computer Science
School of Systems Engineering , University of Reading
http://www.personal.reading.ac.uk/~sis06gd/
Dr. James Reade (SPEIR)
Lecturer in Economics
School of Politics, Economics and International Relations,
University of Reading
http://www.reading.ac.uk/economics/about/staff/j-j-reade.aspx
Henley Business School, University of Reading, Friday 24 April 2015
Workshop onBig Social Data and Interdisciplinary Analytics
![Page 2: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/2.jpg)
by J. Reade and G. Di Fatta 2
Outline
• Introduction
– Motivation
– University of Reading initiative on Big Social Data
– Case study on the General Election 2015 (#GE2015)
• Nuts and bolts
– Twitter tracking and tweets gathering
– Tweets mining
– A knowledge discovery process
• Data analysis examples
– analysis of some key moments during the Leaders’ TV debate
![Page 3: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/3.jpg)
by J. Reade and G. Di Fatta
Introduction
• Social media has exploded in recent years.
3
![Page 4: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/4.jpg)
by J. Reade and G. Di Fatta
Introduction• Social media defined:
4
• Incredible numbers, incredible potential…
• We are the University of Reading Big Social Data Research Group.
• Formed Summer 2014 covering multiple disciplines across the university.
![Page 5: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/5.jpg)
by J. Reade and G. Di Fatta
Form and Function
• Social media are of interest to social scientists:
• Social (and other) networks influence decision making.
• Favouritism, discrimination, bias, loyalty, etc. all influence
allocations of resources and outcomes.
• Social media are social networks quantified.
• Social networks publicise and propagate information:
• Information availability crucial in decision making.
• More information = better forecasting, better policy making?
• Social media present huge opportunities:
• But huge challenges: Collection, processing, understanding
the data.
• Cross-disciplinary collaboration essential.
5
![Page 6: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/6.jpg)
by J. Reade and G. Di Fatta
An Open Multidisciplinary Group
Our group consists of:
• Computer scientists (Di Fatta, Stahl)
– Data Mining and Knowledge Discovery in Databases (KDD): collecting, processing and
extracting useful knowledge from data.
• Mathematicians (Vukadinović Greetham)
– Complex analysis of network dynamics.
• Applied Linguists (Jaworska)
– Extracting meaning from qualitative data.
• Economics (Reade, Nanda)
– Information is fundamental: Where does it appear, how is it propagated? Does it
influence prices/voting behaviour, or vice versa?
• Social scientists
– What can we learn about social (and other types of) interaction and outcomes?
6
![Page 7: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/7.jpg)
by J. Reade and G. Di Fatta
Reading and the General Election
• On March the 1st we began collecting Tweets related to
politics and the general election
– General election related tweets: #GE2015, #Tories, #Labour, etc.
• In 53 days we’ve collected:
– 13M tweets 250K tweets/day 2.8 tweets/sec.
– with over 1.8M tweets during three TV debates alone.
• But what to do with this information?
7
April 2 April 16
![Page 8: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/8.jpg)
by J. Reade and G. Di Fatta
Sentiment Analysis?
• Simple volume of tweets may be interesting, but is it useful?
• Increasing focus on sentiment, or mood: What do people
think?
– Does mood/sentiment yield predictive power?
– Academic papers have considered stock markets and sports events.
– During election time, sentiment hugely interesting…
• Who is ahead? Do big shifts occur?
• What messages stick? Persistence in sentiment?
8
![Page 9: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/9.jpg)
by J. Reade and G. Di Fatta
Sentiment Analysis?
• Perhaps however, we have jumped a step:
– Sentiment is a latent concept: We never observe its true value.
– We can try to estimate it but we have no true value to compare against.
9
![Page 10: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/10.jpg)
Nuts and Bolts
![Page 11: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/11.jpg)
by J. Reade and G. Di Fatta
• Twitter, described as "the SMS of the Internet“, is an online
social networking service that enables users to send and read
short 140-character messages called "tweets".
– launched in 2006
– photos and short videos can also be embedded
• In 2012, 100 million users, 340 million tweets per day
• In December 2014, more than 500 million users: more than
284 million are active.
• Record tweets: on February 3, 2013, Twitter announced that a
record 24.1 million tweets were sent the night of Super Bowl.
11
![Page 12: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/12.jpg)
by J. Reade and G. Di Fatta
Twitter Per Second Records1. 143K TPS: TV broadcast of Anime movie "Castle in the Sky" in Japan on
Dec. 9, 2011
– At one point viewers joined forces, sending tweets at the same time to
symbolically help the movie's characters cast a spell.
2. 15K TPS: Euro 2012 Finals
– as Spain scored the winning goal against Italy in the 2012 European
Championship,
3. 10K TPS: Last Minutes of Super Bowl 2012
– as the Giants took the lead on a touchdown with 57 seconds left
…
16. 5.5K TPS: Japanese Earthquake and Tsunami on March 11, 2011
– Twitter turned into an emergency service for many following an 8.9 magnitude
earthquake and subsequent tsunami on Japan’s coast, while in Tokyo the
phone system went down.
12
![Page 13: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/13.jpg)
by J. Reade and G. Di Fatta
Gathering Tweets
• Three methods to retrieve tweets
– Search API
• Representational State Transfer (REST) requests
• max 3200 tweets for each requests
• free
– Streaming API
• real-time streaming, OAuth for secure delegated access
• max 1% of the total volume of tweets
• free
– Firehose
• real-time streaming
• unlimited and guaranteed
• not free: only from Twitter commercial partners (e.g., DATASIFT)
13
![Page 14: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/14.jpg)
by J. Reade and G. Di Fatta
Twitter Tracking and Tweets Collection
• Tracking terms on the Twitter Streaming API and gathering all
tweets which match them.
– more than 30 tracked terms, e.g.: ge2015, uklabour, conservative,
votetories, ukip, voteukip, LibDems, GreenParty, SNP, etc.
• But what if you track “Cameron”?
14
Cameron Dallas is an 18-year-old Vine celebrity.
Vine is a short video sharing service and microblogging website.
![Page 15: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/15.jpg)
by J. Reade and G. Di Fatta
Twitter Tracking and Tweets Collection
• And what if you track “Labour”?
15
![Page 16: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/16.jpg)
by J. Reade and G. Di Fatta
Twitter Tracking and Tweets Collection
• A new software has been developed.
– Objective: collect all tweets from March to May 2015 that are related to
UK politics.
– The software tracks terms on the Twitter Streaming API and gathers all
tweets which match them.
1. tracked terms (~30)
– e.g., ge2015, uklabour, votelabour, conservative, votetories, ukip, voteukip,
LibDems, GreenParty, SNP, etc
2. tracked terms that require a context check
– e.g. labour, greens, etc.
3. terms for context check (~50)
– e.g., government, politic, vote, election, parliament, economy, etc.
4. rejected terms
– e.g. USA, Canada, Clinton, TCOT, etc.
5. equivalent terms for aggregation of party references
– e.g. Tories, Tory, voteTories, Conservatives, etc.
16
![Page 17: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/17.jpg)
by J. Reade and G. Di Fatta
A Multi-Threaded Process• There are three concurrent threads of execution which never stop:
1. the tweets consumer which
• manages the stream of tweets for the tracked terms,
• receives and process tweets from Twitter in real time and
• stores them to a secondary memory
2. the controller which
• controls that the tweets consumer is working properly
• and, if not, it starts a new consumer
3. the observer which
• generates and sends periodic summaries by email
• Further analytics is generated off-line by additional processing, such as
generation of
– counts, word clouds, co-occurrence of terms, sentiment index
17
![Page 18: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/18.jpg)
by J. Reade and G. Di Fatta
Tweets Mining
• Term frequency
– Tweets as bag of words for computing
• Frequent tracked terms
• Frequent words
– Word clouds
• Twitter Sentiment Index
– A list of adjectives has been extracted from ‘political’ tweets
– Each adjective has been classified as positive, negative or neutral by
several team members.
– If a party or one of its equivalent terms is present in a tweet, positive
and negative adjectives contribute to a sentiment index for the party.
18
![Page 19: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/19.jpg)
by J. Reade and G. Di Fatta
A Knowledge Discovery Process
• A process of knowledge discovery from social media
data streams (Twitter)
data gathering,
filtering and in-
line analytics
Streaming
APIdata
storage
off-line data
analytics
To join the mailing list please contact <[email protected]>
Blog URL: http://blogs.reading.ac.uk/reading-general-election-blog/
1h and 24h automatic reports sent to:
From March 01:
13M tweets,
currently 350K
tweets per day
![Page 20: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/20.jpg)
by J. Reade and G. Di Fatta
Number of Tweets per day• as reported by the observer by email at midnight
• Important TV events:
– debate-1 on 26/03/2015: ”Cameron & Miliband: The Battle for Number 10″
– debate-2 on 02/04/2015: ”Leaders’ debate″
– debate-3 on 16/04/2015: ”Challengers’ debate″
debate-3
debate-1
debate-2
![Page 21: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/21.jpg)
by J. Reade and G. Di Fatta
Twitter Boom on April the 2nd
Leaders’ Debate (02-04-2015, 20:00-22:00)– If you have missed the TV debate, you can watch it on YouTube:
• https://www.youtube.com/watch?v=7Sv2AOQBd_s
• ‘political’ tweets on the entire day (24h)
– recorded: 800,350
– #leadersdebate: 438,944
– missed: 175,959 (18%) (because of Twitter track-limit)
– ext. total: 976,309
• TV debate related/induced tweets from 19:00 to 24:00
– recorded: 614,800
– ext. total: 790,759
![Page 22: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/22.jpg)
by J. Reade and G. Di Fatta
Leaders’ Debate (02-04-2015)
• The number of tweets with a reference to a party
# tweets
(5’ intervals)
debate
![Page 23: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/23.jpg)
by J. Reade and G. Di Fatta
Leaders’ Debate (02-04-2015)
• Twitter Sentiment Index: before, during and after the debate
Sentiment
Index
(5’ intervals)
debate
![Page 24: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/24.jpg)
by J. Reade and G. Di Fatta
Leaders’ Debate (02-04-2015)
• Two ‘interesting’ moments during the debate
– Two ‘interesting’ time intervals following those moments
#1 @ 20:55 #2 @ 21:35
10’ 20’Twitter
Sentiment
Index
(1’ intervals)
![Page 25: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/25.jpg)
by J. Reade and G. Di Fatta
Leaders’ Debate (02-04-2015, 20:54)
#1 @20:54: Nigel Farage’s controversial statement (18” video)
![Page 26: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/26.jpg)
by J. Reade and G. Di Fatta
Leaders’ Debate (02-04-2015, 20:55)
#1 @20:55: Nicola Sturgeon’s reply to Nigel Farage (8” video)
![Page 27: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/27.jpg)
by J. Reade and G. Di Fatta
Leaders’ Debate (02-04-2015, #1)
• #1: word cloud for tweets referring to “SNP” from 21:02 to 21:12
![Page 28: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/28.jpg)
by J. Reade and G. Di Fatta
Leaders’ Debate (02-04-2015, 21:35)
#2 @21:35: Nicola Sturgeon’s statement (12” video)
![Page 29: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/29.jpg)
by J. Reade and G. Di Fatta
Leaders’ Debate (02-04-2015, #2)
• #2: word cloud for tweets referring to “SNP” from 21:40 to 22:00
![Page 30: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/30.jpg)
by J. Reade and G. Di Fatta 30
Conclusions
We have been collecting Big Social Data
tweets about UK politics and GE2015 from March 2015
Simple real-time analysis and more complex off-line
analytics can provide interesting insights.
We will use the data in the future to test research ideas on
Text mining
Data visualisation
Complex networks (social networks)
Economics and Politics
Acknowledgments:
Prof. Steven Mithen (Deputy VC) for supporting this project
as well as HBS, SPEIR, SSE, SLL
![Page 31: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24 · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof](https://reader037.vdocument.in/reader037/viewer/2022090505/6019b3098a61b326e525e63e/html5/thumbnails/31.jpg)
Questions?