to twitter res…  · web viewafter you collate the tweets that relate to a specific topic or...

32
INTRODUCTION TO RESEARCH WITH TWITTER by Simon Moss Introduction In previous decades, researchers tended to depend on surveys, observations, interviews, and other intensive methods to collect data. During more recent years, however, researchers have begun to depend more on social media to collect data. Twitter is the social media site that is, arguably, the most amenable to research. For example, researchers have used Twitter to explore the main concerns that people, such as HDR candidates, have expressed about some issue, such as supervisors the characteristics of people who have conveyed some attitude or concern whether these attitudes or concerns differ across specific demographics, and so forth This document offers preliminary insights into how Twitter can be used to conduct research efficiently (for a summary, see Murphy, 2017; for information on text analysis more broadly, see Banks et al., 2018). In addition, this document offers some insight into the benefits and challenges of Twitter as well as the techniques you can apply to analyse tweets most effectively. Install R and R studio To undertake these techniques, you will need to use the software tool called R. If you have not used R before, you can download and install this software at no cost. To achieve this goal

Upload: others

Post on 24-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

INTRODUCTION TO RESEARCH WITH TWITTER

by Simon Moss

Introduction

In previous decades, researchers tended to depend on surveys, observations, interviews, and other intensive methods to collect data. During more recent years, however, researchers have begun to depend more on social media to collect data. Twitter is the social media site that is, arguably, the most amenable to research. For example, researchers have used Twitter to explore

· the main concerns that people, such as HDR candidates, have expressed about some issue, such as supervisors

· the characteristics of people who have conveyed some attitude or concern

· whether these attitudes or concerns differ across specific demographics, and so forth

This document offers preliminary insights into how Twitter can be used to conduct research efficiently (for a summary, see Murphy, 2017; for information on text analysis more broadly, see Banks et al., 2018). In addition, this document offers some insight into the benefits and challenges of Twitter as well as the techniques you can apply to analyse tweets most effectively.

Install R and R studio

To undertake these techniques, you will need to use the software tool called R. If you have not used R before, you can download and install this software at no cost. To achieve this goal

· visit https://www.cdu1prdweb1.cdu.edu.au/files/2020-08/Introduction%20to%20R.docx to download an introduction to R

· read the section called Download R and R studio

· although not essential, you could also skim a few of the other sections of this document to familiarize yourself with R.

Create a Twitter Developer account

The benefits of Twitter API

Suppose you wanted to ascertain the characteristics of PhD candidates, such as their location, that affects the attitudes towards their supervisors. In principle, you could

· visit twitter.com and enter “PhD supervisor” in the Search box

· you could then enter the tweets into an Excel spreadsheet

· you could then delete irrelevant tweets

· finally, you could perform a variety of analyses

However, in practice, this sequence of actions is inefficient. Instead, you can access a program—called the Twitter API, or application programming interface, to undertake these actions, and many complex operations, more rapidly.

Create a Twitter Developer Account

To access this API, you need to follow a sequence of activities. These activities are straightforward but might demand some time—such as 30 to 60 minutes. In addition, you will have to wait overnight for Twitter to respond to a request. In particular, you should first visit twitter.com. If you have yet to create an account, press “Sign up” and follow the instructions. Then, if you like, perhaps for 15 minutes or so, you could familiarize yourself with Twitter.

Second, visit https://apps.twitter.com/. You might need to login with your Twitter username and password. Press the button “Create an app”. If this button does not appear, contact [email protected] or an IT specialist. Third, you might then receive the following screen. This screen prompts you to organize a developer account—a privileged right in which you can develop additional apps to utilize Twitter. Press “Apply”

To organize a developer account, you may be prompted to complete the following set of questions.

In particular, choose “Doing academic research” and then “Next”. These actions will generate another set of questions. This second batch of questions includes

· in which country do you live

· what would you want us to call you—like a username—and so forth.

Finally, after pressing “Next”, you will receive a final set of questions. As you scroll down, you will be prompted to answer a series of questions. The following table outlines these questions as well as some illustrative answers.

Questions to answer

Example of responses

Please describe how you will analyse Twitter data including any analysis of Tweets or Twitter users

· I will analyze the Tweets to explore attitudes of PhD and Masters by Research candidates towards their supervisors.

· For example, I will assess whether the gender of candidates and supervisors affects these attitudes.

Will your app use Tweet, Retweet, like, follow, or Direct Message functionality?

· Tweets and Retweets

Do you plan to display Tweets or aggregate data about Twitter content outside of Twitter?

· I will not present specific Tweets in publications but will report patterns in the Tweets

Will your product, service or analysis make Twitter content or derived information available to a government entity?

· No

To this end, you need to press “Next” on this page and “Looks good” on the next page—to generate the following display. These options are not available until you answer the mandatory questions. On some questions, you need to exceed the minimum number of characters.

Finally, as indicated in this message, you will receive an email, perhaps the next day. After you press the link on this email, you are granted access to the Twitter API.

How to access the Twitter API

The previous section merely clarified how to create a Twitter Developer Account—an account that is usually necessary to access the Twitter API. You are now ready to request access to the Twitter API. In particular, after you create the Twitter Developer Account, you are likely to receive a screen that resembles the following display.

If you receive this screen, or a similar display, press “Create an app”. Sometimes, you need to select “Create an app” twice to generate the following screen. Your task is to answer the mandatory questions—the questions that are accompanied by the word “Required”. For example,

· in the box labelled “App name”, enter a name you will remember, such as “TutorialCDU”

· in the box labelled “Application description”, you can describe your research project again

· in the box labelled “URL”, enter any website, such as your Twitter homepage, like https://twitter.com/SimonMoss7

· in the box labelled “Callback URLs”, enter http://127.0.0.1:1410

After you complete this form, something like the following screen will appear

In this screen, you should locate three tabs: “App details”, “Keys and tokens”, and “Permissions”. Press the tab “Keys and tokens” to generate the following screen.

Press “Create” under “Access token and access token secret”. Finally, you need to copy and paste the four codes—the two Consumer API keys and the two Access tokens—into another document, such as Word. You will utilise this information in the next phase.

Install packages and connect R to the Twitter API

Finally, before you can search, extract, and analyse Twitter data, you need to connect R to the Twitter API account you developed. To achieve this goal, first install the relevant packages. A package is a set of functions, developed by a team of programmers, designed to perform a particular set of techniques, such as regression models or Twitter analysis. To analyse Twitter data, you should install

· twitteR

· RCurl

· RColorBrewer

· NLP

· tm

· wordcloud

· stringr

To install these packages, while connected to the internet, you should click “Install”, an option half way down the screen, towards the right side.

This option, when clicked, generates the following screen. In the space under “packages”, enter twitteR and then press install. Note the uppercase R. Repeat this procedure to install the other packages as well.

These packages are now installed on your computer but reside in an archive. To enable R to utilize these packages, you need to write some additional code, as illustrated in the following screen. Specifically, on the left side, you should enter “library(twitteR)” and press return. Repeat with the other packages as well.

Finally, to connect R to your API Twitter account, enter the code that appears in the left side of the following table. The quotation marks should be written in R rather than Word. The reason is that R recognises this simple format— " —but not the more elaborate format that often appears in Word, such as “ or ”.

Code to enter

Explanation or clarification

consumer_key <- "MYpEY1GHSH24735H"

· But replace the characters within these quotation marks with your consumer key, saved during the previous phase.

· Retain these quotation marks

consumer_secret <- "Tr0emXhJH8dVE0BB"

· Again, replace the characters within these quotation marks with your consumer secret

access_token <- "623677214-aVlIe8wfhHiJjr7"

· Enter your access token instead

access_token_secret <- "OiVWpraMS7Mkh93"

· Enter your access token secret instead

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_token_secret)

· You do not need to adjust this code at all

If R prompts you to consider “direct authentication”, approve this authentication. After you complete this phase, R can now access your Twitter API and thus search as well as extract information from Twitter. Retain this code, because you need to enter this code again before each session.

Example 1: Search a particular term

To demonstrate how R can be applied to search Twitter, enter a modified variant of the code that appears in the left side of the following table.

Code to enter

Explanation or clarification

first_search <- searchTwitter("PhD", n=100, lang="en")

· This command searches the first 100 instances in which a tweet mentioned “PhD”

· You can change this number

· The search is limited to English tweets

· These tweets are then stored in a variable, like a folder, called “first_search”. You could use any label you prefer.

· In this code, and subsequent code, the bold letters refer to characters you should not change

first_searchDF <- twListToDF(first_search)

· This command then converts these tweets into a data frame—a format that R can utilize more readily in the future

· You would replace “first_search” with the label you used in the previous command

file.timeline <- paste("PhD", "dataFile1.csv", sep="")

· This command is primarily designed to label the file you will soon create

· You can replace “dataFile1” with any suitable label for a data file.

· You can replace “file.timeline” with any label you prefer—as long as you utilize this label in the next command

write.csv(first_searchDF, file.timeline)

· This command converts the Twitter data into a csv file—a format you can open in Excel

You should now be able to locate and open this csv file, called dataFile1.csv. This file will appear in your working directory. To locate this file

· perhaps merely search this file on your computer—as you might search any file

· or, in R studio, choose the “Session” menu at the top. Select “Set Working Directory” and then “Choose directory”. The directory that appears is the location in which your csv file is stored.

Once located, you can open this file, as illustrated in the following screen. To illustrate

· each row corresponds to a separate tweet

· column B presents the tweet

· column C indicates whether someone else had “favorited” the tweet

· column M indicates the number of times this tweet had been retweeted by someone else

When using the searchTwitter command, you can also include several other constraints. For example

· you can restrict the search to a specific date. You would enter “first_search <- searchTwitter("PhD", since "2018-30-01", until "2019-30-01")

· you can extract tweets that included two or more search terms, such as “first_search <- searchTwitter("PhD + university", n = 10, lang = "en")

· you can search the most recent Tweets only, “first_search <- searchTwitter("PhD", n = 10, lang = "en", resultType="recent")

Example 2: Collate all the tweets from one person or account

The previous example collated the latest 100 tweets that mentioned a specific word—in this instance, PhD. You can also collate the latest tweets from a single person or account, such as the tweets that Donald Trump has sent. To achieve this goal, you would enter a modified variant of the code that appears in the left side of the following table.

Code to enter

Explanation or clarification

account <- "realdonaldtrump"

· You would replace “realdonaldtrump” with the name of the Twitter account from which you want to extract the tweets

account.timeline <- userTimeline(account, n = 1000, includeRts = TRUE)

· This command extracts the last 1000 tweets from the account “realdonaldtrump”

· The number after n refers to the number of tweets you want to distill

· includeRts = TRUE merely indicates you would like to include, rather than exclude, retweets.

· You can replace TRUE with FALSE if you prefer

SecondDF <- twListToDF(account.timeline)

· This command then converts these tweets into a data frame—a format that R can utilize more readily in the future

· The data frame is labelled SecondDF—but could be labelled anything

file.timeline2 <- paste(account, "dataFile2.csv", sep="")

· See the previous example for a brief explanation of this command

write.csv(SecondDF, file.timeline2)

· See the previous example for a brief explanation of this command

To locate this csv file, apply the same procedure as you did during the previous example.

Example 3: Word cloud

After you collate the tweets that relate to a specific topic or account, you can display or analyse these data. For example, you can generate a word cloud. This example will not only demonstrate how to generate a word cloud, but will also show you how you can refine the data—such as remove punctuation or other distracting features.

Convert the list of tweets to a vector and then a corpus

Suppose you have already collated 100 tweets that revolve around PhD, using the command first_search <- searchTwitter("PhD", n=100, lang="en"). Rather than translate this information to a csv file, you could instead convert these tweets to a vector and then a corpus. At first glance, these terms might seem meaningless. In essence

· a vector is a series of items, such as [bob, frank, mary, betty]

· a corpus is a set of texts, such as a series of documents.

· these definitions are confusing, however

· instead, you merely need to know that some R procedures that analyse tweets prefer these data to be stored as a corpus

To convert tweets into a vector and then into a corpus, you would enter a modified variant of the code that appears in the left side of the following table.

Code to enter

Explanation or clarification

first_search_as_vector <- sapply(first_search, function(x) x$getText())

· The command sapply converts the tweets in “first_search” to a vector called “first_search_as_vector”

· The x$getText extracts only the text rather than other metadata

first_search_as_corpus <- Corpus(VectorSource(first_search_as_vector))

· The command corpus translates the vector “first_search_as_vector” to a corpus called “first_search_as_corpus”

Clean tweets

The next step is to remove extraneous information—such as punctuation, function words like “the” or “is”, and numbers—from this corpus of tweets. The reason is that such information is not useful to some displays or analyses, such as word clouds. To eliminate this extraneous information, you could enter a modified variant of the code that appears in the left side of the following table.

Code to enter

Explanation or clarification

cleaned_data1 <- tm_map(first_search_as_corpus, removePunctuation)

· Removes the punctuation from the twitters stored in first_search_as_corpus

· The updated corpus is stored in “cleaned_data1”

cleaned_data1 <- tm_map(cleaned_data1, content_transformer(tolower))

· Converts the text to lowercase, primarily to improve the aesthetics of the word cloud

cleaned_data1 <- tm_map(cleaned_data1, removeWords, stopwords ("english"))

· Removes all English stopwords—defined as functional words that are not usually meaningful, such as “is” or “the”

cleaned_data1 <- tm_map(cleaned_data1, removeNumbers)

· Removes all numbers from the data file

cleaned_data1 <- tm_map(cleaned_data1, stripWhitespace)

· Removes unnecessary white spaces—as in gaps between words

Produce the word cloud

Finally, you are ready to produce the word cloud. For example, you could enter code like

> wordcloud(cleaned_data1, random.order = F, max.words = 40, scale=c(3, 0.5), colors = rainbow(50))

This command should generate a plot that typically appears on the right side of your screen. The plot is likely to resemble something like the following example.

In this display, the largest word—phd—represents the term used most frequently in the tweets. This result is inevitable, because “phd” was the search term. Instead,

· the researcher could have created a corpus in which the word “phd” was removed

· to achieve this goal, the researcher could enter the code cleaned_data1 <- tm_map(cleaned_data1, removeWords, c("phd"))

You can adjust the code to modify the display. The subsequent table clarifies some possible amendments to this code.

Terms that could be amended

Details

cleaned_data1

· Represents the corpus in which you store the cleaned data

random.order=F

· When you include “random.order = F”, the terms that are used most frequently—and thus represented in a larger font—appear closer to the center

· If this term is removed, the larger terms do not necessarily appear closer to the center

max.words = 40

· You can increase or decrease the number of terms in this display

scale=c(3, 0.5),

· You should experiment with these numbers

· As you shift these numbers, the relative difference in size between the smaller and larger terms changes

colors = rainbow(50)

· You can adjust the color

· For example, you could enter colors = "red"

Example 4: Sentiment analysis

Tweets are often subjected to a suite of techniques called sentiment analysis. The aim of sentiment analysis is to assess the emotional content of tweets and other texts, such as product reviews (Bing, 2015). For example, sentiment analysis may gauge whether a person feels positively or negatively towards some product or proposal. Researchers apply a range of methods to achieve this goal. In general, they utilize these methods to

· classify words into positive and negative sets

· identify the percentage of positive and negative words in a specific body of text, such as tweets

Hundreds of studies have utilised sentiment analysis—a technique that is sometimes called opinion mining (but see Tsytsarau, M., & Palpanas, 2012, for a distinction between the two terms). To illustrate, sentiment analysis has been utilised to reveal that

· after people are exposed to negative content in the media, they are more likely to report symptoms of depression or physical illness later (Wormwood, Devlin, Lin, Barrett, & Quigley, 2018)

· individuals become increasingly likely to express negative words during their teenage years until they reach about 17 (Hipson, 2019)

· the mental state of individuals diagnosed with anorexia nervosa affects the words they use in their writing (Spinczyk, Nabrdalik, & Rojewska, 2018).

Convert all the text into a series of separate words

To conduct sentiment analysis in R, researchers usually need to convert their corpus of tweets into a series of distinct words. To achieve this goal, you could enter a modified variant of the code that appears in the left side of the following table. You do not need to understand this code fully. Most of this code is redundant if you had already cleaned the data, as illustrated in the previous example. Only the first command, second last command, and last command are crucial.

Code to enter

Explanation or clarification

collapsed_data1 <- paste(cleaned_data1, collapse=" ")

· This code is designed to convert all the tweets into a single line—necessary to conduct some of the subsequent phases

collapsed_data1 <- gsub(pattern = "\\W", replace= " ", collapsed_data1)

· This code removes some of the punctuation

collapsed_data1 <- gsub(pattern = "\\d", replace= " ", collapsed_data1)

· This code removes the numbers

· That is, the “pattern” argument locates digits.

· The “replace” argument replaces these digits with an empty space

collapsed_data1 <- tolower(collapsed_data1)

· This code shifts all the letters to lowercase

collapsed_data1 <- removeWords(collapsed_data1, stopwords())

· This code removes all stopwords—functional words such as “is” or “the”

collapsed_data1 <- gsub(pattern = "\\b[A-z]\\b(1)", replace=" ", collapsed_data1)

· This code removes all letters that appear by themselves—and, therefore, are not words. The only exceptions, a and I, would have been removed in the previous step.

collapsed_data1 <- str_split(collapsed_data1, pattern = "\\s+")

· This code splits all the content into separate words—because, later, you want to examine each word separately

final_data1 <- unlist(collapsed_data1)

· This code then converts this list of separate words into a form that is necessary for the next step.

Download positive words and negative words

The previous set of commands were merely designed to convert your tweets into a series of words, stored under a particular label. The next phase is to ascertain which of these words are positive and which of these words are negative. To achieve this goal, researchers have developed a set of positive words and a set of negative words. You need to download these words onto your computer and then compare these words to your tweets. Specifically, you could complete the activities that appear in the following table.

Activity to complete

Explanation or clarification

Proceed to http://ptrckprry.com/course/ssd/data/positive-words.txt

· On this site is a file that comprises an extensive set of positive words

Copy and paste the words, beginning with “abound”, into a Word file

Save this file as a text file in the working directory of R

· To identify the working directory of R, enter the command, getwd()

· This command specifies the working directory—the directory that R uses by default. For example, the directory might be /users/johnsmith

· After choosing “File” and “Save as” in word, select the file format called plain text or txt

· Label the file “positivewords” or something similar

· Save this file in the working directory, such as /users/johnsmith

Proceed to http://ptrckprry.com/course/ssd/data/negative-words.txt

· Complete the same procedure you applied to the positive words

You have now downloaded a set of positive words and a set of negative words onto your computer. Finally, to complete your sentiment analysis, you need to compare your Tweets to these words. To achieve this goal, you could enter a modified variant of the code that appears in the left side of the following table.

Code to enter

Explanation or clarification

match(final_data, "positivewords")

· This code determines whether the words in your tweets, stored in final_data, match any of the positive words

· The output will resembles something like NA NA NA NA 264 NA 143 …

· NA corresponds to words in your tweets that do not match the positive words

· The numbers correspond to words that do match the positive words

number_of_positive_words <- sum(!is.na(match(final_data, "positivewords")))

· This code counts the number of items in the previous output that is not an NA—and thus are positive words

number_of_positive_words

· This code generates the outcome of the previous formula—and thus represents the number of positive words in the tweets

match(final_data, "negativewords")

· Same as the previous code, except for negative words

number_of_negative_words <- sum(!is.na(match(final_data, "negativewords")))

·

number_of_negative_words

·

This code will thus uncover the number of positive words and the number of negative words in your tweets. Some researchers then compute an index, using a formula that resembles the following equation (Spinczyk et al., 2018), in which higher values indicate a more positive sentiment.

· (number of positive words – number of negative words/( number of positive words + number of negative words)

Lemmatization.

To improve these analyses, researchers should apply a technique called lemmatization. In particular, researchers should convert individual words to their canonical form. For example,

· the word “cries” would be converted to “cry”

· a variety of procedures and software programs, such as TreeTagger, can be utilized to achieve this goal.

Machine learning

This sentiment analysis utilised a dictionary of positive words and negative words to assess the emotional content of these tweets—an approach that is sometimes called lexicon-based categorization. Some researchers utilise more comprehensive dictionaries, such as the NRC Emotion Lexicon (Mohammad & Turney, 2010). Typically, to validate these dictionaries

· participants rate the extent to which a set of words are positive or negative

· to construct the dictionaries, researchers then extract words that tend to be perceived as positive and words that tend to be perceived as negative (Mohammad & Turney, 2010)

Rather than invite participants to evaluate words, some researchers utilise other techniques to distinguish positive words and negative words. In particular, many researchers utilise machine learning to achieve this goal. To illustrate

· Go, Bhayani, and Huang (2009) subjected 1 600 000 messages into a series of techniques, such as Naïve Bayes, Maximum Entropy, and Support Vector Machines.

· These techniques were designed to identify words that tend to coincide with positive emoticons, such as smiley faces, as well as to identify words that tend to coincide with negative emoticons (for other examples, see Giachanou & Crestani, 2016).

Limitations and complications with the use of Twitter data

Twitter, although an excellent source of data, is not infallible. The following table outlines some of the limitations and problems that researchers who extract data from Twitter should consider (see also Giachanou & Crestani, 2016; Murphy, 2017).

Problem

Clarification or illustration

How to address the problem

Retweets. Individuals often retweet—that is, share the tweets of other users.

· The researcher might inadvertently overestimate the frequency of attitudes that had merely been retweeted

· Retweets can be excluded from the search

Negation. Many of the algorithms and techniques that researchers utilise to analyse Twitter overlook negation

· Many techniques classify “I am happy” and “I am not happy” as equivalent, because the only key term in these phrases is “happy”

· This limitation is not specific to Twitter but relevant to many techniques that analyze text

· Kiritchenko et al. (2014) presented an approach that can be applied to examine and address this issue

Data sparsity. Tweets include many misspelled words, abbreviated words, incorrect words, and other errors.

· Consequently, many of the terms in Twitter only appear a few times, sometimes limiting the capacity of some algorithms and software to analyze the data and uncover patterns

· See Saif et al. (2012) for an attempt to address this concern

Unique stop words. Words that are often perceived as meaningless in other formats—and thus deleted by many algorithms—are sometimes relevant in Twitter

· Many existing applications, such as software that is designed to conduct sentiment analysis, remove functional words, such as “the”, “and”, “is”, “who” and so forth

· Some of these words, such as “like”, tend to be more meaningful in Twitter

· Thus, applications that are relevant to other settings might not be suitable to Twitter

· See Saif et al. (2014) for an attempt to explore and address this matter

Multi-modal data and other unique features. Tweets also include images and videos—data that are not as simple to extract and analyse.

· Thus, researchers frequently overlook data that might be informative.

· Tweets may also include other distinct features that researchers tend to overlook, such as emojis, unconventional punctuation such as !!!, as well as uppercase letters or more letters to indicate emphasis

Multiple languages. Many tweets comprise more than one language

· Because tweets are limited in length, applications may not be able to decipher the languages

· Therefore, even if researchers attempt to limit the language to English, words from other languages might be extracted inadvertently

· Narr et al. (2012) developed a classifier, useful to researchers who want to conduct sentiment analysis, that is independent of language and, thus, may circumvent this problem.

Incomplete geographic information. The nation or city of participants is often hard to gauge

· Therefore, researchers cannot always characterize the demographics of users well

· Sometimes, geographic information can be distilled from location hashtags, the longitude and latitude of users, and other cues.

Spam. Some tweets are sent by advertisers or bots and, therefore, may be misleading

· You can omit users with too few or too many followers—a pattern that is common in spam advertisers

· See also Allem et al. for insights on how to distinguish human users from bots

References

Allem, J. P., Ferrara, E., Uppu, S. P., Cruz, T. B., & Unger, J. B. (2017). E-cigarette surveillance with social media data: social bots, emerging topics, and trends. JMIR public health and surveillance, 3(4), e98

Banks, G. C., Woznyj, H. M., Wesslen, R. S., & Ross, R. L. (2018). A review of best practice recommendations for text analysis in R (and a user-friendly app). Journal of Business and Psychology, 33(4), 445-459.

Bing, L. (2015). Sentiment analysis. Cambridge, UK: Cambridge University Press.

Denecke, K., & Deng, Y. (2015). Sentiment analysis in medical settings: New opportunities and challenges. Artificial Intelligence in Medicine, 64, 17–27.

Giachanou, A., & Crestani, F. (2016). Like it or not: A survey of twitter sentiment analysis methods. ACM Computing Surveys (CSUR), 49(2), 1-41

Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision.

Technical Report. Standford

Hipson, W. E. (2019). Using sentiment analysis to detect affect in children’s and adolescents’ poetry. International Journal of Behavioral Development, 43(4), 375-382.

Ikram, M. T., Afzal, M. T., & Butt, N. A. (2018). Automated citation sentiment analysis using high order n-grams: a preliminary investigation. Turkish Journal of Electrical Engineering & Computer Sciences, 26(4), 1922-1932.

Jung, H., Park, H. A., & Song, T. M. (2017). Ontology-based approach to social data sentiment analysis: Detection of adolescent depression signals. Journal of Medical Internet Research, 19, e259.

Kang, D., & Park, Y. (2014). Review-based measurement of customer satisfaction in mobile service: Sentiment analysis and VIKOR approach. Expert Systems with Application, 41, 1041–1050.

Kiritchenko, S., Zhu, X., & Mohammad, S. M. (2014). Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research, 50, 723–762.

Mohammad, S. M., & Turney, P. D. (2010). Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. Proceedings of the NAACL-HLT Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, CA

Murphy, S. C. (2017). A hands-on guide to conducting psychological research on Twitter. Social Psychological & Personality Science, 8, 396–412.

Narr, S., Michael, H., & Albayrak, S. (2012). Language-independent twitter sentiment analysis. In Proceedings of the Knowledge Discovery and Machine Learning at LWA 2012.

Parthasarathy, G., & Tomar, D. C. (2015). A survey of sentiment analysis for journal citation. Indian Journal of Science and Technology, 8, 35.

Poria, S., Cambria, E., Winterstein, G., & Huang, G. B. (2014). Sentic patterns: Dependency-based rules for concept-level sentiment analysis. Knowledge-Based Systems, 69, 45-63.

Rill, S., Reinel, D., Scheidt, J., & Zicari, R. V. (2014). PoliTwi: Early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis. Knowledge-Based Systems, 69, 24–33.

Saif, H., Fernandez, M., & Alani, H. (2014). Automatic stopword generation using contextual semantics for sentiment analysis of twitter. In Proceedings of the ISWC 2014 Posters & Demonstrations Track at the 13th International Semantic Web Conference

Saif, H., He, Y., & Alani, Y. (2012). Alleviating data sparsity for twitter sentiment analysis. In Knowledge Discovery, 24, 478–514.

Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. Sebastopol, CA: O’Reilly Media.

Spinczyk, D., Nabrdalik, K., & Rojewska, K. (2018). Computer aided sentiment analysis of anorexia nervosa patients’ vocabulary. Biomedical Engineering Online, 17(1), 19

Tsytsarau, M., & Palpanas, T. (2012). Survey on mining subjective data on the web. Data Mining and

Workshop on Making Sense of Microposts (#MSM2012): Big Things Come in Small Packages at the 21st International Conference on the World Wide Web

Wormwood, J. B., Devlin, M., Lin, Y.-R., Barrett, L. F., & Quigley, K. S. (2018). When words hurt: Affective word use in daily news coverage impacts mental health. Frontiers in Psychology, 9, 1–10.

Yadollahi, A., Shahraki, A. G., & Zaiane, O. R. (2017). Current state of text sentiment analysis from opinion to emotion mining. ACM Computing Surveys, 50(2), 1-33.