language of politics on twitter - 02 twitter
TRANSCRIPT
Language of Politics on TwitterSummer School in AI
American University BeirutJune 16, 2015
Yelena Mejova@yelenammSocial Computing GroupQatar Computing Research Institute, HBKU
political
analysis
Usersindividualsnewsorganizationsbots…
#hashtagsword or phrase preceded by a hash mark (#), used within a message to identify a keyword or topic of interest and facilitate a search for it
linksall links are shortened by Twitter to form t.co/…
shortercontrol for spam, malware, phishingcollect clickthrough information
MEMEan idea, behavior, or style that spreads from person to person within a culture
Richard Dawkins
MEME
Monthly active users302 million (4/28/2015)
Total number of Twitter registered users“about a billion” (9/16/13)
Unique monthly visitors to Twitter.com (desktop)36 million (10/3/13)
Daily active twitter users100 million (10/3/13)
Number of Twitter accounts that have ever sent a tweet
550 million (4/14/14)
TWITTER RESEARCH
Google Trends
userstweets
relationships
Twitter API
https://dev.twitter.com/overview/documentation
users
try it yourself
• go to https://apigee.com/console/twitter • select OAuth1 from Authentication and log in
using your Twitter account
Select api.twitter.com/1.1 from Service
Click on theon the left to see a list of API methods
• select• enter your Twitter handle into screen_name
and click
http://jsonviewer.stack.hu/
http://www.faceplusplus.com/demo-detect/
More info from picture
questions
where are you from?are you male or female?what job do you have?
when did you join?how active are you?
what do you look like?are you a bot?
tweets
#!/usr/bin/env python# -*- coding: utf-8 -*-
from tweepy.streaming import StreamListenerfrom tweepy import OAuthHandlerfrom tweepy import Streamimport sysimport urllib
# Go to http://dev.twitter.com and create an app.# The consumer key and secret will be generated for you afterconsumer_key = '4x8XS232ncHXewIOPa50eZZWz'consumer_secret = '0rjF9c34QgjK6nlL9zSpptAmVntDDsXRKV5JS3sQ0bi15flq5Y'
# After the step above, you will be redirected to your app's page.# Create an access token under the the "Your access token" sectionaccess_token = '2958638362-6VIJ2S7zSX7ellLHvrFLbsJKBKimIDuk62O8ZNP'access_token_secret='EwqIjYNJKDGhJskYHdMS8nX7dBqpxB94qmmarJL058B9I'
class StdOutListener(StreamListener): """ A listener handles tweets are the received from the stream. This is a basic listener that just prints received tweets to stdout. """
def on_data(self, data): print data[:-1] return True
def on_error(self, status): print status
Querying public stream using python(1)https://tinyurl.com/aiss15-gettweets
def auto_restart_stream(auth,listner,l_keywords): while True: try: sapi = Stream(auth, l) sapi.filter(track=l_keywords) except: #print 'Restarting ;)' continue
if __name__ == '__main__': keywords = [u'Cátar',u'Catar',u'Katar',u'Katara',u'Kataras',u'Katari',u'Kataro',u'Qadar',u'Qatar',u'u'कतर',u'ਕਤਰ',u'卡塔尔,'قطر ',u'卡塔爾 ',u'카타르 ',u'קטאר',u'कत�र',u'કતા�ર''కతర్',u'ກາຕາ',u'カタール ',u'Κατάρ',u'Катар',u'Қатар',u' ',u'ատար কা�তা�র',u'ಕತಾ�ರ್ �',u'ഖത്തർ',u'කටා�ර්',u'กาตาร์�',u'קַאטַאר',u'கத்தா�ர்',u'ប្�ទេ�សកាតា',u'ကတနို��င်�င်�'] l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) auto_restart_stream(auth,l,keywords)
Querying public stream using python(2)https://tinyurl.com/aiss15-gettweets
{"created_at":"Wed May 13 11:44:24 +0000 2015","id":598453736839598080,"id_str":"598453736839598080","text":"Don't get star struck often but I like this guy @Mo_Farah you the man boss! Much respect to you! #Doha #qatar http:\/\/t.co\/wf8nc0C527","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":788413,"id_str":"788413","name":"Mohsin Ali","screen_name":"mohsin","location":"Doha, Qatar","url":"http:\/\/mohsinali.com","description":"Digital story telling, infogrpahics, interactives, R&D, Emerging Technologies, Future Trends, Innovation @ajlabs, Global Nomad, Likes Maps. LBA, DHA, BHA, DOH","protected":false,"verified":false,"followers_count":2422,"friends_count":645,"listed_count":69,"favourites_count":889,"statuses_count":10756,"created_at":"Thu Feb 22 11:11:01 +0000 2007","utc_offset":10800,"time_zone":"Riyadh","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/462946198211407873\/xWaKYtpF.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/462946198211407873\/xWaKYtpF.jpeg","profile_background_tile":true,"profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1249217364\/n504379828_3076_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1249217364\/n504379828_3076_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/788413\/1399210132","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":{"type":"Point","coordinates":[25.316197,51.498302]},"coordinates":{"type":"Point","coordinates":[51.498302,25.316197]},"place":{"id":"0181f32937df0de8","url":"https:\/\/api.twitter.com\/1.1\/geo\/id\/0181f32937df0de8.json","place_type":"admin","name":"Doha","full_name":"Doha, Qatar","country_code":"QA","country":"\u062f\u0648\u0644\u0629 \u0642\u0637\u0631","bounding_box":{"type":"Polygon","coordinates":[[[51.4477039,25.2216],[51.4477039,25.4263938],[51.630581,25.4263938],[51.630581,25.2216]]]},"attributes":{}},"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"Doha","indices":[97,102]},{"text":"qatar","indices":[103,109]}],"trends":[],"urls":[],"user_mentions":[{"screen_name":"Mo_Farah","name":"Mo Farah","id":83855918,"id_str":"83855918","indices":[48,57]}],"symbols":[],"media":[{"id":598453717596119040,"id_str":"598453717596119040","indices":[110,132],"media_url":"http:\/\/pbs.twimg.com\/media\/CE4ifEPUIAAhCsG.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CE4ifEPUIAAhCsG.jpg","url":"http:\/\/t.co\/wf8nc0C527","display_url":"pic.twitter.com\/wf8nc0C527","expanded_url":"http:\/\/twitter.com\/mohsin\/status\/598453736839598080\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":453,"resize":"fit"},"medium":{"w":600,"h":800,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":768,"h":1024,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":598453717596119040,"id_str":"598453717596119040","indices":[110,132],"media_url":"http:\/\/pbs.twimg.com\/media\/CE4ifEPUIAAhCsG.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CE4ifEPUIAAhCsG.jpg","url":"http:\/\/t.co\/wf8nc0C527","display_url":"pic.twitter.com\/wf8nc0C527","expanded_url":"http:\/\/twitter.com\/mohsin\/status\/598453736839598080\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":453,"resize":"fit"},"medium":{"w":600,"h":800,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":768,"h":1024,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1431517464252"}
https://tinyurl.com/aiss15-tweetjson
JSON Tweet Objecthttp://jsonviewer.stack.hu/
JSON Tweet Objecthttp://jsonviewer.stack.hu/
JSON Tweet Objecthttp://jsonviewer.stack.hu/
import jsonimport codecsfrom geopy import *
fin = open("rawTweets.txt",'r')fout = open("parsedTweets.txt",'w')
line = fin.readline().rstrip()while (line): jdict = json.loads(line)
if jdict['coordinates'] != None or jdict['place'] != None: # Coordinates if jdict['coordinates'] != None: longitude = jdict['coordinates']['coordinates'][0] latitude = jdict['coordinates']['coordinates'][1] fout.write(str(longitude)+'\t’) fout.write(str(latitude)+'\t')
# Tweet id fout.write(str(jdict['id'])+'\t’) # User screen name fout.write(jdict['user']['screen_name'].encode("UTF-8")+'\t’) # Timestamp fout.write(str(jdict['timestamp_ms'])+'\t’) # User's language fout.write(jdict['user']['lang']+'\t’) # Text fout.write(jdict['text'].encode("UTF-8").replace('\n'," ").replace('\r\n',"")) fout.write('\n')
line = fin.readline().rstrip()
fin.close()fout.close()
Extracting individual fields from JSONhttps://tinyurl.com/aiss15-cleanjson
Tab Separated Value (TSV) format
Language Model
http://tweetcloud.icodeforlove.com/
workshop 25twitter 20religion 17interaction 12online 12dyad 9research 9accepted 7…
questions
what are you interested in?how do you eat/sleep/work/hang out?
how happy are you?what political opinions do you have?what outside sources do you link to?
what new emerging topics are you mentioning?how do you behave?
are you a bot?
network
networknodesedges
User Network
User Network
Follower Network
Mention Network
Mention Networkfor hashtags
questions
how influential are you?how influential are your connections?
who influences you?what are people around you like?
do you bring together different communities?how fast will you know about a piece of news?
are you an opinion leader?are you a bot?
resources
https://dev.twitter.com/overview/documentation
https://apigee.com
try it in your favorite language
https://dev.twitter.com/overview/api/twitter-libraries
next
using Twitter data for real-world political speech mining