harvesting data from twitter workshop: hands-on experience
TRANSCRIPT
Harvesting Data from Twitter: Hands on
Experience
Dr. Nora alTwairesh, Ms. Tarfa alBuhairi, Ms. Mawaheb alTuwaijri, and Ms. Afnan alMoammar
Content
• Introduction about Twitter API• Some ready to use tools (no programming)• Comparison between R and Python• R• Python
WHY
TWITTER?!
Why Twitter
• Twitter has become a mass information hub that can be used to study the evolution of any issue matter: revolutionary machine• Research disciplines that study Twitter data spanned
the domains of computer science, information science, communications, business, economics, education, medicine, political science, and sociology.
• Recent studies show that %60 of daily Arabic tweets are from Saudi Arabia.
Why Twitter
Hamdy Mubarak and Kareem Darwish. 2014. Using Twitter to collect a multi-dialectal corpus of Arabic. ANLP 2014:1.
Twitter API
• Free access to the tweets posted in the last 7 days within a certain rate-limit. • Any tweets posted earlier than 7 days are considered historical
tweets and should be purchased through third party providers• The Twitter API provides three interfaces for tweet collection:
Streaming API, REST API and Search API
Streaming API• The Streaming API provides real-time tweets in a live-poll fashion. • In a Streaming API, requested tweets will be constantly flowing as
they are posted on Twitter. It is delivered in three bandwidths: “spritzer” :1%, “gardenhose”: 10% and “firehose”: 100% of all tweets posted on Twitter. • A regular user wanting to collect tweets will be granted spritzer
access.
REST API• The REST API was specifically designed for programmatic access
to read and write Twitter data. • Third party applications that interact with Twitter are provided with
a large set of methods in the REST API to develop these applications.• The access of the REST API is also rate-limited, the limit is 150
requests per hour.
Search API• Similar to the REST API, the Search API is pull-based. It replicates
the search functionality provided on the Twitter website. However, tweets retrieved are restricted to the past 7 days.
• the Search API is not appropriate for high-throughput real-time data acquisition. As such Twitter Inc. discourages its use and plans to discontinue it in the future.
Create a Twitter App• To access the Twitter API you need to create a twitter app: follow this simple tutorial to do so:https://iag.me/socialmedia/how-to-create-a-twitter-app-in-8-easy-steps/• you will use the OAUTH settings in both R and Python:• Consumer Key• Consumer Secret• OAuth Access Token• OAuth Access Token Secret
Tools to Collect Tweets
• Nodexl: https://nodexl.codeplex.com/ • Tweet Archivist : https://www.tweetarchivist.com/ • Twitter Archiving Google Spreadsheet (TAGS): https
://tags.hawksey.info/
What is R?
•Roos & Robert.
16
Why R?
Statistics
Machine Learning
Data Analysis
Why R?
Statistics
Machine Learning
Data Analysis Also:
Programming Language
R allows you to integrate with
Code
Code
C++
Code
Jave
CodePython
CodeR
Fastest-growing language
https://www.r-bloggers.com/r-is-the-fastest-growing-language-on-stackoverflow/
fastest-growing language
Examples
Now ..
Open your laptop, please
Steps to install R1: install R:
• https://cran.r-project.org/bin/windows/base/ ---- http://cran.r-project.org/bin/macosx/
2: install RStudio (after installing R)• https://www.rstudio.com/products/rstudio/download3/
3: Install these packages (see the user manual):• streamR/ ROAuth/ RJSONIO/ RTextTools/ e1071/ SparseM.
User manual: • http://www.devchakraborty.com/RunningRJafroc.pdf
R Packages list:• https://cran.r-project.org/web/packages/available_packages_by_date.html
Developing Packages with RStudio:• https://support.rstudio.com/hc/en-us/articles/200486488?version=0.99.903&mode=de
sktop
• https://cran.r-project.org/doc/manuals/R-exts.html
Useful URLs
• https://www.r-bloggers.com • https://www.r-bloggers.com/how-to-learn-r-2/ • http://www.slideshare.net/ChiuYW/r-language-tutorial • https://
www.rwaq.org/courses/introduction-r-programming • https://
www.researchgate.net/publication/288485806_Hybrid_Sentiment_Analyser_for_Arabic_Tweets_using_R
Python
• Two versions: 2.7 3.X• Twitter packages: twitter -- -tweepy• IDE :Anaconda: iPython notebook: Jupyter
Installing Python• Install Anaconda from here• https://www.continuum.io/downloads
choose Python 2.7 version (only for this tutorial)• Install the twitter package: From the command line
(terminal) type: pip install twitter
Comparison between R and Python
• https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis#gs.GuXGfAc• http://blog.udacity.com/2015/01/python-vs-r-learn-first.html• http://www.dataschool.io/python-or-r-for-data-science/
Contact Us
ASA Research Group
Twitter: @ASA__IUEmail: [email protected]: http://asa.imamu.edu.sa/
IWAN Research Group
Twitter: @IWAN_RGEmail: [email protected] Website: http://iwan.ksu.edu.sa
Thank you,
See you later …
THE END ..