five steps to get tweets sent by a list of users
Post on 17-Jul-2015
687 Views
Preview:
TRANSCRIPT
Collecting Tweets Sent by a List of Users
• This Python tutorial is brought to you by
CuriosityBits.com, with the generous support from
Dr. Gregory D. Saxton (http://social-metrics.org/)
1
Five Steps…
1. Install Python and necessary Python libraries.
2. Set up Twitter API Keys.
3. Prepare a list of Twitter handles (screen-names) in .csv
format.
4. Create a SQLite database using SQLite Browser, and
import the Twitter handle list.
5. Modify Python script and run it to get results!
Download the Python scripthttps://drive.google.com/file/d/0Bwwg6GLCW_IPVmNBMUV4bVhUU0U/edit?usp=sharing
2
The results you will get…
You will get an ample amount of metadata for each tweet collected.
Here is a breakdown of some important output variables:
name Def.
tweet_id The unique identifier for a tweet
inserted_date When the tweet is downloaded into your
database
language language
retweeted_status Is the tweet a RETWEET?
content The content of the tweet
from_user_screen_name The Twitter handle of sender
created_at When the tweet is sent
3
name Def.
from_user_followers_count The number of followers a sender has
from_user_friends_count The number of users a sender is following
from_user_listed_count How many times a sender is listed by other users
from_user_statuses_count The number of tweets sent by the sender
from_user_description The profile bio of the sender
from_user_location The location of the sender
from_user_created_at When the sender Twitter account is created
retweet_count How many times a tweet is retweeted
entities_urls The URLs included in a tweet
entities_urls_count The number of URLs included in a tweet
entities_hashtags The hashtags included in a tweet
entities_hashtags_count The number of hashtags in a tweet
entities_mentions The Twitter handles mentioned in a tweet
4
name Def.
in_reply_to_screen_name Whom do the sender reply to
in_reply_to_status_id The unique identifier of the Twitter handle
replied to by the sender
entities_expanded_urls Complete URLs extracted from short URLs
json_output The ENTIRE metadata in JSON format,
including metadata not parsed into columns
entities_media_count NA
media_expanded_url NA
media_url NA
media_type NA
video_link NA
photo_link NA
twitpic NA
5
Step 1. Install Python and necessary libraries
6
Download Anaconda Python 2.7 to run Python scripts. Anaconda is free to download. Once you’ve installed Anaconda, you can modify scripts in Spyder
• Do you know how to install necessary Python libraries? If not, please review pg.8 in
http://curiositybits.com/python-for-mining-the-social-web/python-tutorial-mining-twitter-user-profile/
Install the following libraries
7
Step 1. Install Python and necessary libraries
• Simplejson (https://pypi.python.org/pypi/simplejson)
• Sqlite3 (http://sqlite.org/)
• Sqlalchemy (http://www.sqlalchemy.org/)
• Twython
(https://twython.readthedocs.org/en/latest/index.html)
Step 2: Set up Twitter API Keys.
First, go to https://dev.twitter.com/, and sign in your Twitter account. Go to my applications page to create an application.
8
Enter any name that makes sense to you
Enter any text that makes sense to you
you can enter any legitimate URL, here, I put in the URL of my institution.
Same as above, you can enter any legitimate
URL, here, I put in the URL of my institution.
9
Step 2: Set up Twitter API Keys.
Then, go to API Keys page, scroll down to the bottom and click Create my access token. Wait for a few minutes and refresh the page, then you get all your keys!
you need API Key, API Secret, Access token,
Access token secret.
10
Step 2: Set up Twitter API Keys.
Step 3: Prepare a Twitter handle list
Create a list of Twitter handles whose tweets we are interested in collecting. You can create the list in Excel and save it as csv format. The list should contains three columns (in accordance to the configuration in the Python script).
The first column lists sequential
numbers beginning with 1.
The second column lists Twitter
handles.
For the third column, I
entered 1 all throughout,
but you can leave it blank.
11
Go to http://sqlitebrowser.sourceforge.net/ and
download SQLite Database Browser. It allows you
to view and edit SQLite databases.
12
Step 4: Create a SQLite database
• File-New Database to create a new database.
• Remember the database filename you enter.
• The default file extension is .sqlite, to prevent future
complications, add the extension .sqlite when typing
filename.
13
Step 4: Create a SQLite database
Use File-Import Table From CSV File, import the
.csv file you’ve saved. Name the imported table as
accounts. This table name corresponds to the one
we will use in Python script. After you click create,
the csv list will be loaded into the database, and you
can browse it in Browse Data. Lastly, remember to
save the database.
Stay on the database you’ve just created.
14
Step 4: Create a SQLite database
Modify the imported table: Go to Edit-
Modify Tables, use Edit field to change
column names. To correspond to the
Python script, name the first column as
rowid, and Fileld Type as Integer; the
second column as screen_name, and
Field type String, and the third as
user_type, and String. In the end, the
database table is defined as the
screenshot.
15
Step 4: Create a SQLite database
Step 5: Modify the script and Run
API Key
API secret
Access token
Access token secret
Find this block of code, and enter your API Keys.
16
Step 5: Modify the script and Run
Find this block of code, and enter the filename and file path
of the SQLite database you have created.
You need to match the file path and file name to the SQLite
database you’ve created (RECOMMENDED).
If the Python script file and the created SQLite database are
in the same folder, just paste your database name here. 17
Step 5: Specify search criteria
You can refine search criteria:
e.g.
Count: Specifies the number of tweets to try and retrieve for each
Twitter handle. The maximum value is 200.
More on https://dev.twitter.com/docs/api/1/get/statuses/user_timeline
18
Step 4: Modify the script and Run
In Spyder, Go to Run, and choose Execute in a new dedicated Python
interpreter. The first option Execute in current Python or IPython
interpreter does not work on my end, but may be working on your
computer.
19
Some issues you may encounter
Too many values to unpack ERRORS!!
Don’t panic! It is almost certain that you will hit
roadblocks when learning Python. So, be prepared to
debug.
For this error, it is probably because you’ve saved the
Python script file in a place other than default Python
folders.
But what is default Python folder?
20
Find your default Python folders
A simple way to find out your default Python folder is, On a WINDOWS machine, In Start menu, right-click the Computer and choose Properties
21
On my machine, C:\Anaconda\Lib\site-packages is one of the
default Python folders. If the Python script is running
successfully, it should give you these.
Some issues you may encounter
Oops! Error again!
Twitter API has rate limit. It restricts how many tweets you can get within a time
frame. Based on the current script, you can cover 300ish users in a 15 minute
window. Once you hit the limit, you will see the error message popping up.
There are two ways to get around the restriction:
1. wait for 15 minutes for another run;
2. create multiple Twitter apps and get multiple API keys. Once you use up the
quota in one run, paste in a new key to start a new run!
Some issues you may encounter
But, pay attention to the block of code shown as above, The number 0 means
that the script starts with the user listed in the first row.
Because we will hit rate limit, you will need to run the code multiple times to
complete crawling all users’ tweets. So, make sure to change the starting row
number!
For example, in the first run, you’ve covered user (0) to user (150), and run
into rate limit. You should put 151 as the starting number in the second run.
top related