towards a social negativity index: giving content to ...€¦ · towards a social negativity index:...
TRANSCRIPT
Towards a Social Negativity Index: Giving Content to Financial
Tweeting
Mohamed Al Guindy
Sprott School of Business, Carleton University
March 21st, 2018
ABSTRACT
I develop a linguistic index of investor negativity expressed on social media at the firm-day level
and call it the Social Negativity Index (SNI). Higher SNI levels correspond to lower stock returns
and greater trading volume. Consistent with the psychology literature, markets appear to respond
more to negative tweeting than they do to positive tweeting. For the universe of firms listed on the
NYSE, NYSE American and NASDAQ, firms with more retail ownership, are tweeted about more
– suggesting that tweeting largely originates from the retail investor-base. In addition, firms with
greater dispersion of analyst forecasts are tweeted about more, in what appears to be an attempt by
non-sophisticated investors to resolve the difference of opinion of analysts. The results suggest
that social media has assumed some of the roles traditionally associated with analysts and the
financial media.
JEL classification: G10, G12, G14
Keywords: Social media; Textual Analysis; Wisdom of the crowd; Media
1
1. Introduction
On August 28th, 2017, Gilead Sciences announced the acquisition of Kite Pharma. Five
days prior, an Artificial Intelligence (AI) system that monitors social media conversations, had
predicted this imminent acquisition by observing changes to social media conversations (Ram and
Wiggleworth, 2017). Given the rise of AI systems in financial markets, particularly in the domain
of social media, the goal of this paper is to develop a systematic methodology to quantify large-
scale financial information derived from social media. In particular, I develop a text-based Social
Negativity Index (SNI), reflecting total negativity about a stock on social media, and show that SNI
relates to daily stock performance and trading volume. More generally, I illustrate that social media
has assumed some of the functions traditionally associated with analysts and the financial press.
In recent years, Twitter was used to mobilize Egyptian street protestors during Egypt’s
Arab Spring (Acemoglu, Hassan, and Tahoun, 2018), to predict flu epidemics in New York City
(Broniatowski, Paul and Dredze, 2013), and to aid efforts dealing with hurricanes and other natural
disasters (Seetharaman and Wells, 2017). The relationship between social media and stock returns
has captured attention in recent years. For example, tweets about the pharmaceutical industry by
former US Presidential Candidate, Hillary Clinton, sent the industry stocks down on two separate
events (Egan, 2015; Wang, 2016). Tweets by Senator Bernie Sanders also affected stock
performance of pharmaceuticals (Bloomfield, 2016). Most recently, and perhaps most
prominently, tweets by US President Donald Trump influenced the stock performance of such
companies as Boeing (Lovelace, 2016), Lockheed Martin (Wang, 2016b), and Toyota (Rich,
2017). The influence of tweets from the US President has become so well-established that a mobile
application has been developed to track his tweeting activities and send notifications to investors
2
who own stocks in a company when the President tweets about it. As Rachel Mayer, co-founder
of Trigger Finance (the company that provides this service) puts it, “Tweets really do matter.”
While tweets from prominent figures can affect stock performance, the goal of this paper
is to deal with the subject systemically to determine the extent to which our knowledge of
traditional financial media extends to the domain of social media. In particular, I investigate the
relationship between social negativity expressed on Twitter, and stock returns and trading volume.
One of the contributions of this paper is to develop and make available a daily firm-level index of
total investor negativity expressed on social media – Social Negativity Index (SNI). Both tweeting
volume and SNI appear to be reflected in securities’ prices. Interestingly, one of the key findings
of Tetlock (2007) – that investors respond more to negative language in the financial press than
they do to positive language – appears to extend to social media – which is congruent with the
psychology literature.
I also find that firms with less institutional ownership, and thus greater retail ownership,
are tweeted about more frequently than other firms. This is consistent with the notion that tweeting
likely originates from the retail investor-base. In the same vein, tweeting about firms is higher
where the dispersion of analyst forecasts is greater. This suggests that Twitter provides an outlet
where investors can discuss various views about a stock in the absence of analysts’ consensus.
This last point is particularly important as it suggests a role for social media in the information
production process – a function historically connected with analysts and traditional media.
The setting of this paper is a compelling one to study for a number for reasons. First, the
goal of this paper is to establish a daily systematic link between the aggregation of all opinions
about stocks, as depicted on Twitter, and stock performance. In doing so, this study mimics
previous studies about the financial media but does so in the context of social media. Second,
3
unlike studying tweets that originate from firms or from individuals, the number of tweets that
possible to analyze in this setting is very large (over 18 million tweets) which encompasses every
firm listed on all major US exchanges. Third, most tweeting originating from firms is positive in
tone – as predicted by theories of selective disclosure in Verrecchia (1983) and Jung and Kwon
(1988). Tweeting from individuals, on the other hand, exhibits more variance in tone, thus allowing
for the development of the SNI.
The “Buzz index”
In April 2016, Sprott Asset Management, launched BUZZ (sentiment) Social Media
Insights ETF (NYSE: BUZ)1. This ETF, distributed by ALPS Portfolio Solutions, aggregates the
sentiment of all stocks in the US based on their social media sentiment. Using proprietary textual
analysis, Big Data, and Artificial Intelligence (AI) algorithms, the index selects 75 stocks with the
most positive sentiment to include in the ETF. The ETF itself is reconstructed monthly. Jamie
Wise, the developer of the BUZZ ETF says: “We discovered that the overall level of buzz or
sentiment around stocks was in fact predictive, and could lead to a process where you could select
stocks ranked based on that level of sentiment and ultimately come up with a portfolio of securities
that could outperform the market.2”
Tweets about stocks
Since its inception, Twitter used the hashtag “#” symbol to identify the topic of a tweet.
For example #StanleyCup mentioned in a tweet signifies that the tweet is about the Stanley Cup in
1 See http://www.businesswire.com/news/home/20160419005303/en/Investing-%E2%80%9CSocial%E2%80%9D-
Sprott-BUZZ-Social-Media-Insights 2 See http://www.etf.com/sections/etf-industry-perspective/sprott-new-etf-captures-investor-buzz
4
particular. The use of the ‘#’ symbol not only makes it easy for individuals to tweet about topics,
but also facilitates searches for tweets about specific topics.
Because of the rise of Twitter discussions about the stock market, Twitter introduced the
“cashtag” symbol ($) in 2012. The cashtag is used in lieu of the hashtag to signify that a tweet is
about the stock of a specific firm3. For example, $AMZN stated in a tweet, indicates that the tweet
is about the stock of Amazon Inc. The use of the cashtag makes it easy to identify and isolate
tweets that strictly pertain to the stock of a company.
Literature review
This paper relates to a number of strands in the literature. First, it relates to the literature
that examines the role of the media in financial markets. Second, it relates to the literature on
textual analysis. Finally, this paper relates to the emerging literature on the use of the Internet by
investors, and particularly social media, to communicate financial information.
The financial economics literature established the role of the media in financial markets.
For example, Tetlock (2007) illustrated that stock markets respond to the content of a popular Wall
Street Journal article. Tetlock finds that higher media pessimism predicts downward pressure on
market prices in the short term. Tetlock, Saar-Tsechansky, and Macskassy (2008) further illustrate
the role of media sentiment in that they show that the language content of the media can be used
to predict stock returns and accounting earnings. Engelberg and Parsons (2011) illustrate the causal
impact of media in financial markets, and Fang and Peress (2009) show that media coverage affects
stock returns due to the breadth of information dissemination.
3 See https://www.cnet.com/news/twitter-introduces-ticker-symbol-cashtags-for-finance-searches/ for details about
the introduction of the cashtag.
5
The literature on textual analysis is an emerging strand of literature in finance and is often
combined with studies examining the role of the media in financial markets. Tetlock (2007) and
Tetlock, Saar-Tsechansky, and Macskassy (2008) use the Harvard-IV-4 psychological dictionary
to conduct textual analysis on media content. More recently, Loughran and McDonald (2011)
introduced a second dictionary to extract textual sentiment that pertains specifically to financial
language. The application of textual analysis to social media is an emerging area of interest in the
literature. As Loughran and McDonald (2016) comment “Hopefully, (textual analysis) methods
can be developed that are better able to capture the information in this [social media] very noisy
yet rich source of data.”
Due to advances in technology, the landscape of how investors gather and process
information has evolved. For example, investors use Internet stock message boards (Antweiler and
Frank, 2004). They also use Google to search for and gather financial information (Da, Engelberg,
and Gao, 2011; Drake, Roulstone and Thornock, 2012). Investors also turn to EGAR to collect
financial information (Loughran and McDonald, 2017). Blankespoor, Miller and White (2014),
show that firms that use Twitter to communicate information achieve a lower bid-ask spread,
consistent with a reduction in information asymmetry. Jung, Naughton, Tahoun and Wang (2017)
show that firms’ tweets about earnings announcements can improve their information
environments. Chen, De, Hu and Hwang (2014) show that collective opinion, or wisdom of the
crowd, of opinions transmitted on Seeking Alpha, a popular investment crowd-sourcing platform,
can predict stock returns. Bartov, Faurel, and Mohanram (2016) show that opinions on Twitter,
posted just before earnings announcements predict quarterly earnings. Chen, Hwang and Liu
(2016) examine tweeting of CEOs showing that such tweeting can increase customer base, and
improve stock liquidity, but that some of these effects are subsequently reversed. Chawla, Da, Xu
6
and Ye (2015) use data from TD Ameritrade to show that the diffusion of news, particularly trading
news, on social media is associated with lower bid-ask spreads on news days. Al Guindy (2016)
show that corporate tweeting became significantly more prevalent after the Securities and
Exchange commission (SEC) endorsed social media as an official channel for corporate
communication.
This paper proceeds as follows, section 2 provides an overview of the data used,
particularly, the Twitter dataset. Section 3 explores the predictability of tweeting about firms.
Section 4 constructs the Social Negativity Index (SNI), while chapter 5 examines returns and
trading volume. Section 6 conducts a vector autoregression (VAR) analysis, while section 7
conducts robustness and additional tests, and the conclusion of the paper is stated in section 8.
2. Data and summary statistics
2.1 Twitter data collection
To collect the tweets used in this project. I set up a small laboratory consisting of computers
constantly collecting financial tweets about all firms listed on the three major US exchanges,
NYSE, NYSE American, and NASDAQ. These computers use programs that I wrote in the Python
programming language and make use of the Twitter Application Program Interface (API)4. The
collection of data takes place daily and captures financial tweets on a daily basis. I identify
financial tweets as those that contain the “cashtag” $ symbol, and the stock ticker. In my Python
program, I provide the tickers of all stocks listed on the NYSE, NYSE American and NASDAQ.
Twitter makes these tweets searchable and collectable for a period of approximately seven days5,
4 See https://dev.twitter.com for the Twitter API details. 5 A description of the availability of tweets for search is available at: https://dev.twitter.com/rest/public/search
7
after which they are irretrievable. For this reason, it is necessary to build the infrastructure used in
this paper, and to collect the tweets on a daily basis6. After I collect the tweets, I store them in a
SQL database in preparation for further analysis.
In addition to the Twitter dataset, I obtain stock return and trading volume information
from CRSP, accounting details from COMPUSTAT, Institutional ownership data from Thomson
Reuters 13F filings, and analyst information from I/B/E/S. I exclude firms from regulated
industries, financials, and those that have no Fama-French 48 industry classification. In addition,
I winsorize daily returns at the 0.5% and the 99.5% levels.
Table 1 describes summary statistics about the sample of tweets collected. The sample
contains 18,319,583 tweets covering 2,292 firms in the period between January 1st, 2017 and
December 31st, 2017. The tweets originate from 1.02 Million unique Twitter users. For a detailed
description of all variables used and their sources, please see Appendix A.
[Insert Table 1 here]
Interestingly, tweeters tweet financial information using numerous methods (devices).
Within the sample set, tweeters use more than 8500 systems to tweet! As Table 2 illustrates, the
most common method used to tweet financial information is the Twitter website (25% of tweets),
which is not surprising. However, a substantial volume of tweeting originates from iPhone (12%)
and Android (9%) platforms. While it is generally thought that advances in technology are
allowing broader access to financial information, it appears that technological advancements, such
as mobile devices, are allowing market participants to generate financial information more easily.
6 Data appearing and disappearing quickly is known as “high velocity data” in which the data is only available for a
short period, after which it is not available.
8
[Insert Table 2 here]
One of the benefits of the technological infrastructure built to collect the data for this paper,
is that it allows for the collection of numerous details about each tweet including information about
the tweeter. For example, the sample includes information about the language of the tweeter. The
vast majority of tweets (92%) originate from users in the English language. A small number of
tweets also originates in Russian, Spanish, French, German, Dutch, and Portuguese. Table 3
summarizes the top languages used by tweeters of financial information. It is not surprising that
most of the tweeting is in the English language for two reasons: firstly, most tweeting around the
globe is in English; secondly, the sample of firms used in this paper are those listed in the major
American exchanges. The fact that Twitter is officially blocked in China7 likely explains the
absence of tweeting in Mandarin and Cantonese.
[Insert Table 3 here]
2.2 Financial tweeting daily and hourly distribution
Next, it is useful to examine the distribution of financial tweeting throughout the week.
Figure 1 shows the percentage of all tweeting on each day of the week. As the figure shows, less
tweeting takes place on the weekend. In particular, the volume of tweeting on Saturdays and
Sundays is about half of the volume on weekdays. The volume of tweeting is higher, and generally
somewhat similar for most week days – with Tuesdays exhibiting slightly more tweeting than other
week days.
[Insert Figure 1 here]
7 See http://www.businessinsider.com/websites-blocked-in-china-2015-7/#facebook-4
9
The breakdown of financial tweeting by hour of day is illustrated in Figure 2. As expected,
financial tweeting is highest during market hours and is generally lower outside of market hours.
There is also a period of elevated discussions just prior to and just after market hours.
[Insert Figure 2 here]
Necessarily, not every firm is tweeted about as frequently as other firms. For example,
Apple appears to be the firm most tweeted about in this sample, with 556.499 tweets. Other firms
in the sample include Amazon (458.891 tweets), Twitter Inc (297,282 tweets), Netflix (203,608
tweets) and Starbucks (58,002 tweets). Table 4 shows the total number of tweets for a subsample
of firms used in this study.
[Insert Table 4 here]
3. Determinants of tweeting about firms
3.1 Firm characteristics that predict tweeting volume
Given that firms are tweeted about with varying frequencies, I seek to identify the
characteristics associated with firms that are tweeted about frequently. For this, purpose, I use the
following regression model:
ln(𝑇𝑜𝑡𝑎𝑙 𝑇𝑤𝑒𝑒𝑡𝑖𝑛𝑔 𝑉𝑜𝑙𝑢𝑚𝑒)𝑖
= 𝛼1 + 𝛽1 ∗ 𝐹𝑖𝑟𝑚 𝐵𝑒𝑡𝑎 𝑖 + 𝛽2 ∗ 𝐵𝑜𝑜𝑘 𝑡𝑜 𝑀𝑎𝑟𝑘𝑒𝑡 𝑅𝑎𝑡𝑖𝑜𝑖 + 𝛽3 ∗ 𝐹𝑖𝑟𝑚 𝑆𝑖𝑧𝑒 𝑖
+ 𝛽4 ∗ 𝐿𝑒𝑣𝑒𝑟𝑎𝑔𝑒 𝑖 + 𝛽5 ∗ 𝑃𝑎𝑦𝑜𝑢𝑡 𝑟𝑎𝑡𝑖𝑜 𝑖 + 𝛽6 ∗ 𝐼𝑛𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑎𝑙 𝑂𝑤𝑛𝑠𝑒𝑟𝑠ℎ𝑖𝑝 𝑖
+ 𝛽7 ∗ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐴𝑛𝑎𝑙𝑦𝑠𝑡𝑠 𝐹𝑜𝑙𝑙𝑤𝑜𝑖𝑛𝑔 𝑡ℎ𝑒 𝐹𝑖𝑟𝑚 𝑖
+ 𝛽8 ∗ 𝐷𝑖𝑠𝑝𝑒𝑟𝑠𝑖𝑜𝑛 𝑜𝑓 𝐴𝑛𝑎𝑙𝑦𝑠𝑡 𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡𝑠 𝑖 + 𝛽9 ∗ 𝐼𝑛𝑑𝑢𝑠𝑡𝑟𝑦 𝑖 + 𝜀𝑖
10
The dependent variable is the natural logarithm of the total number of tweets about a firm
in the full sample. Beta is the CAPM beta, the book to market ratio represents the faction of the
firm’s book value relative to its market value. Firm size is the natural logarithm of the dollar value
of the firm’s shares. Leverage is the proportion of debt in the firm’s capital structure. Institutional
ownership represents the percentage of shares in the firm held by institutional investors. The
number of analysts following the firm is the number of unique analysts providing EPS estimates
for the firm. Finally, the dispersion of analyst forecasts is the standard deviation of the analysts’
forecasts scaled by the mean estimate. In addition, industry fixed effects are included in the model.
The model is depicted in Table 5 and shows that the volume of tweeting about a firm
depends on many of the firm characteristics above. In particular, larger firms are tweeted about
more than smaller firms, which is consistent with the notion that investors are paying more
attention to larger firms. Firms with a higher CAPM beta are tweeted about more frequently than
firms with a lower beta, suggesting that riskier firms attract more discussions. Firms with lower
institutional ownership are tweeted about more than firms with greater institutional ownership.
This suggests that much of the tweeting of financial information originates from retail rather than
institutional investors. One somewhat surprising result, is that firms with greater analyst coverage
are tweeted about more often than firms with less analyst coverage, but this may be due to the fact
that the same reasons that attract additional analyst coverage also attract additional retail interest.
However, where the dispersion of analyst forecasts is greatest, the volume of tweeting is also
higher. This suggests that in the absence of analysts’ consensus, investors tweet more about a stock
in what may be an attempt to resolve the difference of opinion.
[Insert Table 5 here]
3.2 Determinants of daily tweeting volume
11
After examining firm characteristics that predict the volume of investor tweeting about a
firm, I now focus on the determinants of daily tweeting volume. In particular, the goal is to uncover
the factors that lead to a large tweeting volume on a given day. For this purpose, I use the following
model:
ln(𝐷𝑎𝑖𝑙𝑦 𝑇𝑤𝑒𝑒𝑡𝑖𝑛𝑔 𝑉𝑜𝑙𝑢𝑚𝑒)𝑖𝑡
= 𝛼1 + 𝛽1 ∗ 𝐹𝑖𝑟𝑚 𝑟𝑒𝑡𝑢𝑟𝑛 𝑖𝑡−1 + 𝛽2 ∗ 𝑀𝑎𝑟𝑘 𝑟𝑒𝑡𝑢𝑟𝑛𝑖𝑡−1 + 𝛽3 ∗ 𝑉𝐼𝑋 𝑖𝑡−1
+ 𝛽4 ∗ 𝐸𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑑𝑎𝑦 𝑖𝑡 + 𝛽5 ∗ 𝑊𝑒𝑒𝑘 𝑏𝑒𝑓𝑜𝑟𝑒 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑖𝑡
+ 𝛽6 ∗ 𝑊𝑒𝑒𝑘 𝑎𝑓𝑡𝑒𝑟 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑖𝑡
+ 𝛽7 ∗ 𝑇𝑤𝑒𝑒𝑡𝑖𝑛𝑔 𝑣𝑜𝑙𝑢𝑚𝑒 𝑜𝑛 𝑡ℎ𝑒 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑑𝑎𝑦 𝑖𝑡 + 𝛽8 ∗ 𝐼𝑛𝑑𝑢𝑠𝑡𝑟𝑦 𝑡𝑤𝑒𝑒𝑡𝑖𝑛𝑔 𝑖𝑡
+ 𝛽9 ∗ 𝐹𝑖𝑟𝑚 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 𝑖 + 𝜀𝑖𝑡
This model is a panel regression model where the dependent variable is the natural
logarithm of the number of daily tweets about a firm. The independent variables are the firm’s
return on the previous trading day8, the market return on the previous trading day, the volatility
index on the previous trading day. Other independent variables include whether a given day is the
day when a firm announces its quarterly earnings, whether the day is in the week leading to the
earnings announcement, or the week following the earnings announcement. The model also
includes the firm’s tweeting volume on the previous day, as well as the tweeting volume for the
firm’s industry (based on the Fama-French 48 industry classification). Firm fixed effects are
included to account for the heterogeneity in firm characteristics. Standard errors are clustered by
firm and trading day as suggested by Peterson (2009).
8 Returns are calculated from the close of markets on the previous trading day to the close of markets on the current
trading day. This definition also corresponds to the definition of a ‘tweeting day’ which spans the same time from the
close of markets on the previous day to the close of markets on a given day.
12
The results, documented in Table 6, show that tweeting volume about a firm is driven, in
part, by its return on the previous trading day. In particular, where the return on the previous day
is high, a firm is tweeted about more by investors. Interestingly, the coefficient on
𝑀𝑎𝑟𝑘 𝑟𝑒𝑡𝑢𝑟𝑛𝑖𝑡−1, is not statistically significant, suggesting that tweeting volume about a firm is
not dependent on the market’s previous day’s return, but only on the individual firm’s return.
Similarly, tweeting volume about a firm is not dependent on the volatility index (VIX) on the
previous day.
The coefficient on 𝐸𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑑𝑎𝑦 𝑖𝑡 is positive and significant at the 1% level. This is
rather expected since investor attention is likely to be highest on the day of the firm’s earnings
announcements. Similarly, tweeting volume remains high during the week following earnings
releases, as suggested by the positive and significant coefficient on 𝑊𝑒𝑒𝑘 𝑎𝑓𝑡𝑒𝑟 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑖𝑡. In
addition, the volume of Twitter discussions is also high during the week leading to the day of
earnings announcements as suggested by the positive and significant coefficient on
𝑊𝑒𝑒𝑘 𝑏𝑒𝑓𝑜𝑟𝑒 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑖𝑡. It is not surprising that the number of tweets about a stock is highest
during earnings season.
If tweeting volume is high on a given day, it is likely to be high on the next day, as
suggested by the positive and significant coefficient on 𝑇𝑤𝑒𝑒𝑡𝑖𝑛𝑔 𝑣𝑜𝑙𝑢𝑚𝑒 𝑜𝑛 𝑡ℎ𝑒 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑑𝑎𝑦 𝑖𝑡.
This suggests autocorrelation of tweeting, or that tweeting about firms is “sticky”.
Finally, this model examines whether tweeting about a firm corresponds to tweeting
volume about other firms in the same industry. To construct the 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 𝑡𝑤𝑒𝑒𝑡𝑖𝑛𝑔 variable for
a given firm, I sum all the tweeting about all firms in the same Fama-French 48 industry
classification (excluding the firm in question). I then divide the total number of tweets by the total
number of firms in the industry (again excluding the firm in question) resulting in the average daily
13
tweeting volume for the industry. The coefficient on this variable is positive and significant at the
1% level of significance, suggesting that investors are likely to tweet about a firm on a given day
if they tweet about other firms in the same industry. This result is plausible insofar as investors are
likely to pay attention to firms in the same industry at the same time.
[Insert Table 6 here]
4. Constructing the Social Negativity Index (SNI)
In this section, I construct the Social Negativity Index (SNI). The SNI is an index bounded
by the values 0 and 1. 0 represents no negativity, while 1 represents maximum negativity about a
stock on a given day. Sections 5 and 6 will relate SNI to stock returns and trading volume.
Not all tweets are expected to have the same effect on asset prices; tweets that contain
positive information will affect markets differently from tweets containing negative information.
I conduct textual analysis to identify the tone or the linguistic sentiment of each tweet in the dataset.
A detailed procedure of the textual analysis algorithm used in this paper follows.
Textual analysis, as a subfield of finance, has gained prominence over the last decade.
Tetlock (2007) used the Harvard IV-4 Psychological Dictionary to analyze the tone of a popular
Wall Street Article. Tetlock showed that the aggregate sentiment of the article (identified by the
proportion of negative words contained in the article) affects stock returns for the subsequent
trading days. The SNI developed in this paper captures the negativity reflected on social media in
particular rather than print media.
Loughran and McDonald (2011) demonstrated that financial language is unique in
comparison to “normal” English language. For example, according to the Harvard Dictionary, a
word such as “cancer” or “debt” would be treated as a negative word. However, in financial
14
language, such words are not necessarily negative. For example, a pharmaceutical company
developing a drug for cancer will likely use the word ‘cancer’ extensively in its statements and
news. Similar, the word ‘debt’ is used frequently without implying any negative meaning. For this
reason, Loughran and McDonald (2011) developed a new dictionary that is particularly suited to
analyzing financial language taking some of the above issues into account.
In this paper, the tweets from investors are likely to contain financial information by virtue
of being discussions about the stock of a firm, for this reason, I use the Loughran and McDonald
(2011) dictionary for the main analysis, and later use the Harvard Psychological Dictionary for
robustness.
To conduct the textual analysis, I use the Python programming language and obtain a copy
of the Loughran and McDonald (2011) dictionary9. The text of each tweet is analyzed individually
using the program. Each tweet receives a score corresponding to the count of the number of
positive and the number of negative words in the tweet. Tweets containing more positive than
negative words are deemed positive, and tweets containing more negative words are deemed
negative. Tweets containing an equal number of positive and negative words, as well as tweets
containing no key positive or negative words are deemed neutral. In the sample set, 1,956,800
tweets are identified as positive, 2,092,904 as negative, and the remaining tweets as neutral.
In the next stage, I aggregate the tweets for each firm on each day. I add up the number of
positive tweets and negative tweets for each firm. Having identified the sentiment of each tweet
and knowing the total number of tweets for a given firm, I am now ready to define the SNI (for
each stock-day) as follows:
9 The dictionary of financial keywords is available on Bill McDonald’s website: https://www3.nd.edu/~mcdonald/
Word_Lists.html
15
𝑆𝑜𝑐𝑖𝑎𝑙 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑖𝑡𝑦 𝐼𝑛𝑑𝑒𝑥 (𝑆𝑁𝐼) =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑤𝑒𝑒𝑡𝑠 𝑤𝑖𝑡ℎ 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑡𝑤𝑒𝑒𝑡𝑠 𝑎𝑏𝑜𝑢𝑡 𝑓𝑖𝑟𝑚
5. SNI, tweeting volume, and market reaction
5.1. SNI, tweeting volume and stock returns
In this section, I examine firms’ stock returns on a given market day to identify whether
SNI and tweeting volume correspond to stock returns. In Table 7, I conduct variations of the
following model (at a daily frequency):
𝑅𝑒𝑡𝑢𝑟𝑛 (𝑏𝑎𝑠𝑖𝑠 𝑝𝑜𝑖𝑛𝑡𝑠)𝑖𝑡
= 𝛼1 + 𝛽1 ∗ 𝑆𝑁𝐼𝑖𝑡 + 𝛽2 ∗ ln(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑤𝑒𝑒𝑡𝑠)𝑖𝑡 + 𝛽3 ∗ 𝐹𝑖𝑟𝑚 𝑟𝑒𝑡𝑢𝑟𝑛 𝑖𝑡−1
+ 𝛽4 ∗ 𝑀𝑎𝑟𝑘𝑒𝑡 𝑟𝑒𝑡𝑢𝑟𝑛 𝑡 + 𝛽5 ∗ 𝑀𝑎𝑟𝑘𝑒𝑡 𝑟𝑒𝑡𝑢𝑟𝑛 𝑡−1 + 𝛽6 ∗ 𝐸𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑑𝑎𝑦 𝑖𝑡
+ 𝛽7 ∗ 𝑊𝑒𝑒𝑘 𝑏𝑒𝑓𝑜𝑟𝑒 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑖𝑡 + 𝛽8 ∗ 𝑊𝑒𝑒𝑘 𝑎𝑓𝑡𝑒𝑟 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑖𝑡
+ 𝛽9 ∗ 𝑉𝐼𝑋𝑡 + 𝛽10 ∗ 𝐷𝑎𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑤𝑒𝑒𝑘 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠
+ 𝛽11 ∗ 𝐹𝑖𝑟𝑚 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠𝑖𝑡 + 𝛽12 ∗ 𝐷𝑎𝑦 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 𝑖𝑡 + 𝜀𝑖𝑡
In Table 7, I regress daily returns (in basis points) on SNI and the number of tweets
generated about each firm (the variables of interest). Control variables include the firm’s lagged
return, market return, lagged market return, a dummy variable that takes a value of 1 if the day is
the firm’s quarterly earnings release. As well as dummy variables for the week before and the
week after a firm’s quarterly earnings announcement. I also include the volitively index (VIX) and
day of the week fixed effects. Furthermore, I include firm fixed effects, which account for the
heterogeneity of firm characteristics, and day fixed effects (in some of the models) which account
for daily market conditions. Standard errors are double clustered by firm and day.
16
In the first model (1), I examine whether SNI corresponds to stock returns. As Table 7
shows, an increase of social negativity from 0 to 1 corresponds to a negative return of 35.5 basis
points. This result is significant at the 1% level. Furthermore, this result is robust in all the
specifications examined. In model (2), I focus on tweeting volume
ln (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑤𝑒𝑒𝑡𝑠) about a firm. The model shows that an increase of one unit of tweeting
volume corresponds to a 25.6 basis points increase in returns. This result is also significant at the
1% level and robust in all specifications examined. To the extent that the number of Twitter
discussions corresponds to investor attention, this result can be seen in light of Barber and Odean
(2008), who reported that retail investors are net buyers of attention-grabbing stocks.
In model 3, I combine SNI with tweeting volume in the same specification, and find that
the results are similar to the ones described above for each of tweeting volume and SNI. Taken
together, these results suggest that both SNI and tweeting volume correspond to stock returns. It
may be possible to think of tweeting volume as a proxy for attention, and of SNI as a proxy for
market sentiment.
Models 4-6 replicate the analysis of models 1-3 with the exception that day fixed are
included in the model. Because day fixed effects capture overall daily market conditions such as
market return, VIX, etc. such variables are omitted from the control vector.
[Insert Table 7 here]
5.2 SNI, tweeting volume and trading volume
Having looked at the relationship between SNI, tweeting volume and returns, I now focus
on trading volume. The analysis is analogous to the one in section 5.1, but instead focuses on
trading volume. To examine this relationship, I use variations of the following model:
17
𝑇𝑟𝑎𝑑𝑖𝑛𝑔 𝑣𝑜𝑙𝑢𝑚𝑒𝑖𝑡
= 𝛼1 + 𝛽1 ∗ 𝑆𝑃𝐼𝑖𝑡 + 𝛽2 ∗ ln(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑤𝑒𝑒𝑡𝑠)𝑖𝑡 + 𝛽3 ∗ 𝐹𝑖𝑟𝑚 𝑟𝑒𝑡𝑢𝑟𝑛 𝑖𝑡−1
+ 𝛽4 ∗ 𝐹𝑖𝑟𝑚 𝑡𝑟𝑎𝑑𝑖𝑛𝑔 𝑣𝑜𝑙𝑢𝑚𝑒 𝑖𝑡−1 + 𝛽5 ∗ 𝑀𝑎𝑟𝑘𝑒𝑡 𝑟𝑒𝑡𝑢𝑟𝑛 𝑡
+ 𝛽6 ∗ 𝑀𝑎𝑟𝑘𝑒𝑡 𝑟𝑒𝑡𝑢𝑟𝑛 𝑡−1 + 𝛽7 ∗ 𝐸𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑑𝑎𝑦 𝑖𝑡
+ 𝛽8 ∗ 𝑊𝑒𝑒𝑘 𝑏𝑒𝑓𝑜𝑟𝑒 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑖𝑡 + 𝛽9 ∗ 𝑊𝑒𝑒𝑘 𝑎𝑓𝑡𝑒𝑟 𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 𝑖𝑡
+ 𝛽10 ∗ 𝑉𝐼𝑋𝑡 + 𝛽11 ∗ 𝐷𝑎𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑤𝑒𝑒𝑘 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠
+ 𝛽12 ∗ 𝐹𝑖𝑟𝑚 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠𝑖𝑡 + 𝛽13 ∗ 𝐷𝑎𝑦 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 𝑖𝑡 + 𝜀𝑖𝑡
𝑇𝑟𝑎𝑑𝑖𝑛𝑔 𝑣𝑜𝑙𝑢𝑚𝑒 𝑖𝑡 is the natural logarithm of the number of shares traded for a given firm
on a given day. The independent variables of interest are SNI and the number of tweets. In addition,
I use the same control variable used in the analysis of stock returns in section 5.1, with the addition
of lagged trading volume. As before, standard errors are clustered by firm and day.
The results of this analysis are reported in Table 8. Specification 1 examine the relationship
between SNI and trading volume. Interestingly, an increase in SNI corresponds to an increase in
trading volume. This suggests that investors trade more on days of increased negativity
(pessimism); necessarily, this also suggests that investors trade less on days of low negativity (high
optimism).
Specification 2 focuses on tweeting volume as the parameter of interest. Not surprisingly,
greater tweeting volume corresponds to greater trading volume. I then combine SNI and tweeting
volume in specification 3 and find that the results remain consistent. Finally, I replicate the analysis
of specifications 1-3 in specifications 4-6 but use day fixed effects and yield similar results.
[Insert Table 8 here]
18
It is perhaps instructive to summarize these results, along with the return results from the
previous section. It appears that days of increased investor tweeting activities correspond to greater
stock returns (on the day of) as well as increased trading volume. Controlling for tweeting volume,
the SNI corresponds to asset returns; particularly, greater negativity expressed in the form of higher
SNI corresponds to lower returns. Trading volume, on the other hand, increases as the negativity
expressed on social media increases.
5.3 Social Positivity rather than social negativity
The discourse thus far focused on social negativity, an alternate framing of this exposition
can focus on positive language rather than negative language. In other words, it is possible to
construct a Social Positivity Index instead of the Social Negativity Index, where the unit of measure
is a positive tweet rather than a negative tweet. I repeat the analysis of Table 7 with the exception
that I use social positivity rather than social negativity, and report the results in Table 9.
[Insert Table 9 here]
Table 9 shows that, while social positivity corresponds to positive stock returns, the results
are both economically and statistically weak – suggesting that markets do not necessarily respond
favorably to positive tweeting. This is in contrast to negativity – which has a strong negative impact
on asset prices. Tetlock (2007) reported that markets respond more to negative language than to
positive language in the financial press. This paper demonstrates that Tetlock’s finding extends to
social media – that markets respond more to negativity than to positivity. This finding is also
consistent with the psychology literature, which argues that negative information has greater
impact and is processed more thoroughly than positive information (Rozin and Rozyman (2001),
Baumeister, Bratslavsky, Finkenauer, Vohs, (2001)).
19
6. SNI and vector autoregression (VAR) analysis
Thus far, the analysis has been focused on examining a single day. In the previous sections,
I showed that firm returns and trading volume correspond to tweeting volume, and more
importantly to the linguistic sentiment expressed in the form of SNI. Given that SNI is a daily
index and that daily returns are calculated at a daily frequency, a natural way to model the
interaction between the two variables is a panel vector autoregression (VAR) analysis.
The VAR methodology was used by Tetlock (2007) to examine the dynamics of returns
and sentiment. In Tetlock’s setting, the sentiment is updated daily. Similarly, in the setting of this
paper, SNI is updated daily. Two benefits of the VAR analysis are that it allows us to determine
whether SNI has predictive power over returns on subsequent days. Equally importantly, it allows
us to determine whether returns experience return reversals on days following SNI shocks.
In this panel VAR settings, I define two endogenous variables: returns and SNI. The VAR
model accounts for 5 lags of returns and SNI representing roughly a trading week. For this purpose,
it is useful to define the Lag operator (Lx) as used in Tetlock (2007). The Lag operator of a variable
represents a vector consisting of x number of lags of the variable. For example, L5(zt) is the vector
[zt-1, zt-2, zt-3, zt-4, zt-5,]. I also use the 0 subscript to denote the inclusion of the contemporaneous
term as follows: L50(zt)= [zt, zt-1, zt-2, zt-3, zt-4, zt-5,].
I run the following VAR model with the results summarized in Table 10.
𝑅𝑒𝑡𝑢𝑟𝑛𝑖𝑡 = 𝛽1 ∗ 𝐿50(𝑆𝑁𝐼) 𝑖𝑡 + 𝛽2 ∗ 𝐿5(𝑅𝑒𝑡𝑢𝑟𝑛) 𝑖𝑡 + 𝛽3 ∗ 𝐸𝑥𝑜𝑔𝑖𝑡 + 𝜀𝑖𝑡
In this model, returns are daily firm returns in basis points. Returns are included for the
current day as well as 5 lags of the returns. SNI is included with 5 lags in addition to the
contemporaneous term. The exogenous (control) variables, include the market return with five
20
lags, contemporaneous and five lags of tweeting volume, dummy variables for earnings day, week
before earnings, week after earnings, and day of the week fixed effects.
[Insert Table 10 here]
As Table 10 shows, when the system is exposed to an SNI, or a high social negativity
shock, the returns on the same day are reduced by 37 basis points. This figure is statistically
significant at the 1% level of significance. The remainder of the table shows the returns 1, 2, ….5
days after the SNI innovation. Importantly, we see no evidence of return reversals (or any major
changes for that matter) as suggested by the lack of statistical significance of all the days. This
suggests that social negativity expressed through SNI is permanent (at least for the duration of the
trading week). This finding is consistent with the findings of Tetlock, Saar-Tsechansky, and
Macskassy (2008), showing that news stories – or in this case tweets – have a permanent impact
on prices. The results offer a contrast to the findings of Antweiler and Frank (2006), who show
that news stories about firms, regardless of tone, while triggering an initial market response, are
later reversed.
7. Additional and robustness tests
7.1 Effect of earnings announcements period:
One of the aims of this paper is to show that aggregate tweeting volume and sentiment
about stocks contains stock-relevant information. The results suggest that this is the case. One
possible concern, however, is that the results may be driven by the earnings announcements period.
More specifically, it may be that aggregate tweeting contains useful information during earnings
announcements period, but not outside of those periods, or in other words, that the results are
driven by earnings announcements.
21
This possibility is already addressed in the analysis of sections 5.1 and 5.2 in which dummy
variables for earnings day, week before earnings, and week after earnings are included. To address
this issue more directly, however, I conduct further analysis on the sample having removed
earnings announcements period (earnings day, week before earnings, and week after earnings).
The results of this analysis are reported in panel B in Tables 7 and 8. This robustness test confirms
the main finding that that aggregate tweeting about firms contains useful information on a daily
basis, and not only during earnings season. Solomon (2012) explains that the media plays a more
important role outside of earnings season than during earnings season because earnings season is
a time where information is already abundant. In the case of social media, it appears to play a role
both during and outside earnings season.
Tetlock, Saar-Tsechansky, and Macskassy (2008) show that the majority of news stories
published about firms are clustered close to days of earnings announcements. In the case of
tweeting, while significantly more tweeting occurs during earnings season (earnings day, week
before earnings day, and week after earnings day), much tweeting still occurs outside of this
window. Specifically, in the sample set, approximately 75% of tweeting occurs outside of earnings
season, while 25% of tweeting occurs during earnings season. Unlike print media, which is
physically constrained by space availability in the publication, tweeting does not face the same
constraints – this is an important distinction between print media and social media.
7.2 Alternative definition of SNI
One of the central tenants of this paper is the definition of SNI. In all the previous
analysis, I define SNI as the number of negative tweets divided by the total number of tweets
22
about a firm on a given day. One possible alternative definition of SNI is:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡𝑤𝑒𝑒𝑡𝑠−𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑤𝑒𝑒𝑡𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑤𝑒𝑒𝑡𝑠
This alternate definition is different in that it directly accounts for the number of positive
tweets about a firm on a given day. I repeat the analysis of Tables 7 and 8 using this alternate
definition and find that the results are not affected by this choice. The results of this analysis are
shown in the Internet Appendix Tables IA.1 and IA.2.
7.3 Alternate dictionary for identifying sentiment
In the preceding analysis, I used the Loughran and McDonald (2011) dictionary to classify
tweets as positive or negative in tone. As explained, this dictionary is specifically accurate at
classifying the sentiment of financial language. Indeed, the tweets examined here are exclusively
financial. As a robustness test, however, I replicate the analysis of Table 7 using the Harvard
Psychological dictionary instead and report the results in Internet Appendix Table IA.3. The results
are generally consistent with those illustrated in Table 7.
8. Conclusion
This paper illustrates that social negativity expressed on social media corresponds to asset
prices. In particular, the Social Negativity Index (SNI), which measures the daily aggregate
negativity about a stock expressed on social media, corresponds to negative returns. Using a VAR
analysis, I show that these results are not reversed on subsequent trading days. Unlike negativity,
social media positivity has a much weaker positive relationship to asset returns – suggesting that
23
markets react more to negativity than to positivity. This is consistent with the psychology
literature, that humans react more to negative information than to positive information.
For the universe of stocks listed on the NYSE, NYSE American and NASDAQ, firms with
more retail ownership are tweeted about more than firms with less retail ownership suggesting that
tweeting originates largely from the retail investor base. Moreover, firms with greater dispersion
of analysts’ forecasts are tweeted about more than firms with less dispersion. This suggests that
investors may turn to social media for discussions about stocks when analysts disagree.
Overall, the results in this paper suggest that social media has the capacity to assume many
of the roles of traditional print media. Furthermore, social media may aid in the information
production process – a function traditionally associated with financial analysts.
Looking forward, as Artificial Intelligence (AI) becomes an important emerging trend in
financial markets, social media will follow suit as a source of abundant information about financial
securities. This paper contributes to this emerging domain by showing that social media contains
useful information about financial markets. Indeed, AI has been used to monitor social media
discussions and predict acquisition activities – this trend is likely on the rise.
24
References
Acemoglu, D., Hassan, T., Tahoun, A., 2018. The power of the street: evidence from Egypt’s Arab
Spring. Review of Financial Studies 31(1): 1–42.
Al Guindy, M., 2016. Is corporate tweeting informative or is it just hype? Evidence from the SEC
social media regulation. Working paper.
Antweiler, W., Frank, M., 2004. Is all that talk just noise? The information content of Internet
stock message boards. Journal of Finance 59: 1259–1293.
Antweiler, W., Frank, M. 2006, Do U.S. stock markets typically overreact to corporate news
stories? Working paper, University of British Columbia
Barber, B., Odean T., 2008. All That Glitters: The effect of attention and news on the buying
behavior of individual and institutional investors. Review of Financial Studies 21(2): 785-
818
Bartov, E., Faurel, L., Mohanra, P., 2016. Can Twitter Help Predict Firm-Level Earnings and Stock
Returns?. Working Paper.
Baumeister, R., Bratslavsky, E., Finkenauer, C., Vohs, K., 2001, Bad is stronger than good.
Review of General Psychology 5, 323–370.
Blankespoor, E., Miller, G., White, H., 2014. The role of dissemination in market liquidity:
evidence from firms’ use of Twitter. The Accounting Review, 89(1), 79–112.
Bloomfield, D., 2016. Sanders’ tweet on drugmaker Ariad’s ‘Greed’ sends stock plunging
(October 14). Available at: https://www.bloomberg.com/news/articles/2016-10-
14/sanders-tweet-on-drugmaker-ariad-s-greed-sends-stock-plunging.
Bodnaruk, A., Loughran, T., McDonald, B., 2015. Using 10-k text to gauge financial constraints.
Journal of Financial and Quantitative Analysis 50 (4): 623–646.
Broniatowski D., Paul M., Dredze M., 2013. National and local influenza surveillance through
Twitter: An Analysis of the 2012-2013 Influenza Epidemic. PLoS ONE 8(12): e83672.
https://doi.org/10.1371/journal.pone.0083672
Chawla, N., Da, Z., Xu, J., Ye, M., 2015. Catching fire: the diffusion of retail attention on Twitter.
Working Paper, Notre Dame University.
Chen, H. De, P., Hu, Y., Hwang, B.H., 2014. Wisdom of crowds: the value of stock opinions
transmitted through social media. Review of Financial Studies 27, 1367–1403.
Chen, H., Hwang, B.H., Liu, B., 2016. The Economic consequences of having ‘social’ executives.
Working paper, City University of Hong Kong, Cornell University, and Florida State
University.
25
Da, Z., Engelberg, J., Gao, P., 2011. In search of attention. Journal of Finance 66 (5): 1461–1499.
Diamond, D., Verrecchia, R., 1991. Disclosure, liquidity, and the cost of equity capital. Journal of
Finance 46: 1325–60.
Drake, M., Roulstone, D., Thornock, J., 2012. Investor information demand: evidence from
Google searches around earnings announcements. Journal of Accounting Research 50(4):
1001–1040.
Egan, M., 2015. Hillary Clinton tweet crushes biotech stocks, CNN.com (September 22).
Available at: http://money.cnn.com/2015/09/21/investing/hillary-clinton-biotech-price-
gouging.
Engelberg, J., Parsons, C., 2011. The causal impact of media in financial markets. Journal of
Finance 66 (1) 67–99.
Fama, E.; French, K., 1992. The cross-section of expected stock returns. The Journal of Finance
47, 427–465.
Fang, L., Peress, J., 2009. Media coverage and the cross-section of stock returns. Journal of
Finance 64 (5): 2023–2052.
Jung, M., Naughton, J., Tahoun, A., Wang, C., 2017. Do firms strategically disseminate? Evidence
from corporate use of social media. The Accounting Review, Forthcoming.
Jung W., Kwon, Y., 1988. Disclosures when the market is unsure of information endowment of
managers. Journal of Accounting Research 26 (1): 146–153.
Loughran, T., McDonald, B., 2011. When is a liability not a liability? Textual analysis,
dictionaries, and 10-Ks. Journal of Finance 66, 35–65.
Loughran, T., McDonald, B., 2016. Textual analysis in accounting and finance: a survey. Journal
of Accounting Research (forthcoming).
Loughran, T., McDonald B., 2017. The use of EDGAR filings by investors. Journal of Behavioral
Finance 18: 231–248.
Lovelace, B., Donald Trump just took a shot at Boeing in Trump Tower, CNBC.com (December
6). Available at: https://www.cnbc.com/2016/12/06/boeing-shares-slide-after-trump-says-
air-force-ones-cost-out-of-control.html.
Peterson, M. Estimating standard errors in finance panel data sets: comparing approaches. Review
of Financial Studies 22 (1): 435-480
Q4 Web Systems, 2013. New Q4 Whitepaper: Pubic Company Use of Social media for IR – Part
1 Twitter & StockTwits (August 15). Available at: http://www.q4blog.com/2013/08/15/
new-2013-q4-whitepaper-public-company-use-of-social-media-for-ir-part-1-twitter-
stocktwits/
26
Ram, A., Wigglesworth, R. 2017. When Silicon Valley came to Wall Street. Financial Times (Oct
28).
Rich, M., 2017. Trump’s Twitter warning to Toyota unsettles Japanese carmaker. New York Times
(January 6).
Rozin, P., Royzman, E., 2001. Negativity bias, negativity dominance, and contagion. Personality
and Social Psychology Review 5, 296–320.
Scannell, K., 2013. Companies allowed to tweet #USearnings. Financial Times (April 2).
Securities and Exchange Commission (SEC), 2008. Commission guidance on the use of company
websites. Release No. 34–58288. Washington, D.C.: SEC.
Securities and Exchange Commission (SEC), 2013. SEC says social media ok for company
announcements if investors are alerted. Press Release 2013–51.Washington, D.C.: SEC.
Seetharaman, D., Wells, G., 2017. Hurricane Harvey victims turn to social media for assistance.
The Wall Street Journal (August 29).
Solomon, D., 2012. Selective publicity and stock prices. Journal of Finance 67 (2): 599–637.
Tetlock, P.C., 2007. Giving content to investor sentiment: the role of media in the stock market.
Journal of Finance 62, 1139–1168.
Tetlock, P.C., Saar-Tsechansky, M., Macskassy, S., 2008. More than words: quantifying language
to measure firms’ fundamentals. Journal of Finance, 63 (3), 1437–1467.
Verrecchia, R., 1983. Discretionary disclosure. Journal of Accounting and Economics 5 (3), 179–
194.
Wang, C., 2016. Biotech takes a hit after Clinton tweets about EpiPen pricing, CNBC.com (August
24). Available at: https://www.cnbc.com/2016/08/24/biotech-gains-amid-buyout-chatter-
upbeat-clinical-trial-results.html.
Wang, C., 2016b. Lockheed Martin shares take another tumble after Trump tweet, CNBC.com
(December 22). Available at: https://www.cnbc.com/2016/12/22/lockheed-martin-shares-
take-another-tumble-after-trump-tweet.html.
27
Appendix A: Regression variable definitions and data sources
Variable Definition Source
Panel A: Dependent Variables
Return
Trading volume
Daily return on company’s common share
calculated on a 24-hour basis
Natural logarithm of the number of shares traded
CRSP
CRSP
Panel B: Control Variables
Beta
The result of the regression of firms’ monthly
excess return on the excess return of the CRSP
value-weighted portfolio using a 60-month rolling
window defined in June of each year. Excess return
is defined as the monthly return above the one-
month treasury bill.
Author’s calculation from
CRSP returns data
Book to market ratio The ratio of book value of equity to the market
value of equity. The book value is defined as: [the
book value of shareholders’ equity + deferred taxes
and investment tax credit – Book value of preferred
stocks]
Author’s calculation from
COMPUSTAT data
Leverage The ratio of the firm’s long term debt to the total
assets of the firm
Author’s calculation from
COMPUSTAT data
Ln (Size) The natural logarithm of the market value of the
firm’s equity (in millions of dollars).
Author’s calculation from
COMPUSTAT data
Analyst following The number of analysts providing one-year EPS
estimates for the stock
Author’s calculation from
I/B/E/S data
Dispersion of forecasts The dispersion of analyst forecasts is the standard
deviation of analysts’ one-year ahead forecasts
scaled by the mean of estimates
Author’s calculation from
I/B/E/S data
Institutional ownership The total percentage of the company’s shares that
are held by institutional investors
Author’s calculation from
Thomson Reuter’s 13F
Payout This is the ratio of the firm’s net income paid as
dividends. Defined as common dividends/net
income
Author’s calculation from
COMPUSTAT
28
Appendix A (Continued)
Variable Definition
Industry
Market return
VIX
Earnings day
Week before earnings
Week after earnings
The industry membership of the firm in one of the
Fama French 48 industry classifications
The average daily value-weighted market return
(vwretd)
CBOE S&P 500 Volatility Index
A binary variable that takes the value of 1 on a
firm’s earnings announcements day
A binary variable that takes the value of 1 for the
week prior to a firm’s earnings announcement day
A binary variable that takes the value of 1 for the
week following a firm’s earnings announcement
Determined from CRSP
historical SIC codes and
Kenneth French’s website
(to convert SIC to FF 48)
CRSP
CBOE Indexes
Author’s calculation from
Compustat
Author’s calculation from
Compustat
Author’s calculation from
Compustat
Panel C: Twitter Variables
Tweeting volume The natural logarithm of the number of tweets
about a firm on a given day
Twitter API/ author’s
calculation
Social Negativity Index
(SNI)
Number of negative tweets about a stock on a given
trading day divided by the total number of tweets
for that day (bounded by 0 and 1.
Twitter API/ author’s
calculation
Tweeting on previous day
Industry tweeting
A binary variable that takes the value of 1 if the
firm tweeted on the previous day
A unit variable that represents the proportion of
tweeting firms from a given industry on a given day
(excluding the given firm)
Twitter API/author’s
calculation
Author’s calculation
29
Figure 1: Tweeting distribution by day of the week. This figure shows the breakdown of financial
tweets (in percentages) by day of the week in the sample period. The tweets are those that strictly
discuss financial information.
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
% o
f all
tw
eets
Day of the week
30
Figure 2: Tweeting distribution by hour of day. This figure shows the breakdown of financial tweets
(in percentages) by hour of day. The sample includes a total of 12,440,121 financial tweets collected
between January 1st 2017 and October 1st 2017. The tweets are those that strictly discuss financial
information.
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
0-1
1-2
2-3
3-4
4-5
5-6
6-7
7-8
8-9
9-1
0
10-1
1
11-1
2
12-1
3
13-1
4
14-1
5
15-1
6
16-1
7
17-1
8
18-1
9
19-2
0
20-2
1
21-2
2
22-2
3
23-2
4
% o
f a
ll t
wee
ts
Hour
31
Table 1
Twitter sample descriptive statistics
This table provides general descriptive statistics for the sample of tweets. The tweets include all
financial tweets in which a Twitter user mentions a firm’s stock using the $ symbol and the stock
ticker, indicating that the tweet strictly discusses a firm’s stock. Firms are those listed on the NYSE,
AMEX, and NASDAQ.
Sample period
January 1st 2017- December 31st, 2017
Number of tweets 18,319,583
Number of firms covered 2,292
Number of unique tweeters 1,021,106
Number of tweets classified as positive in tone 1,956,880
Number of tweets classified as negative in tone 2,092,904
32
Table 2
Modes of tweeting financial information
This table shows the mode of communication used by Twitter users to tweet financial information.
The Twitter system records the device/method used by the tweeting user. The tweets include all
financial tweets in which a Twitter user mentions a firm’s stock using the $ symbol and the stock
ticker, indicating that the tweet strictly discusses a firm’s stock. Firms are those listed on the NYSE,
AMEX, and NASDAQ.
Mode % of all tweets
Twitter website
24.75%
IFTT (web-based service) 16.23%
Twitter for iPhone 12.15%
Twitter for Android 8.92%
33
Table 3
Top financial tweeting users’ languages
This table depicts the languages used by Twitter users tweeting financial information about stocks
listed on the NYSE, AMEX, or NASDAQ. A Twitter user indicates their language when they sign
up for a Twitter account and this data is summarized below for the dataset used in this paper.
Language % of all tweets
English
92.28%
Russian 1.39%
Spanish 1.30%
French 0.79%
German 0.59%
Dutch 0.57%
Portuguese 0.49%
34
Table 4
Number of tweets for a sample of firms in the dataset
This table depicts the number of tweets in which a firm is mentioned in a financial tweet for a subset of
firms within the sample period. A tweet is identified to belong to a firm when it contains the $ symbol and
the firm ticker (e.g. $AAPL). This signifies that the tweet strictly discusses the stock of the firm.
Firm Number of tweets
Apple Inc.
556,499
Amazon Inc. 458,891
Twitter Inc. 297,282
Nvidia Corp. 221,653
Netflix Inc. 203,608
IBM 84,170
General Motors 71,934
Starbucks 58,002
35
Table 5
Determinants of tweeting about a firm given firm characteristics
This table depicts the predictability of the volume of tweets about a firm given lagged firm characteristics. The sample
covers all firms listed on NYSE, AMEX, and NASDAQ. The dependent variable of the regression takes the value of
100*ln(total number of tweets about a firm). The independent variables are previous year’s parameters: Beta, which
is the CAPM beta; B/M represents the book to market ratio of equity; Size is the natural logarithm of the market value
of equity; Leverage is the leverage ratio of the firm; Payout is the payout ratio, Institution is the percentage of shares
held by institutional investors; Analysts is the number of analysts following the firm. Dispersion is the standard
deviation of analyst forecasts scaled by the absolute value of the mean of forecasts in percentage points. Fama and
French 48 industry fixed effects are also included. ***, **, * denote statistical significance at the 1% 5% and 10% levels
respectively. Standard errors are reported in parentheses.
Number of tweets about
firm
Beta
0.02**
(0.01)
B/M 0.01
(0.02)
Size 0.09***
(0.004)
Leverage 0.05
(0.03)
Payout -0.05*
(0.03)
Institution -0.64***
(0.11)
Analysts 4.25***
(0.29)
Dispersion 1.04***
(0.40)
Industry fixed effects
Included
Adjusted R2 0.65
N 1491
36
Table 6
Determinants of tweeting volume about a firm on a given day
This table documents the predictability of the number of tweets about a given firm on a trading day. The dependent
variable is the natural logarithm of the number of tweets about a firm. Estimates are from a panel regression with firm
fixed effects. Firm’s Returnt-1 is the firm’s return on the previous trading day expressed in percentage points. Market
return t-1 is the market return on the previous trading day expressed in percentage points. VIX t-1 is the previous day’s
volatility index. Earnings day is the day of the firm’s earnings announcement, Week Before Earnings is the week prior
to the firm’s earnings announcement. Week After Earnings is the week after the firm’s earnings announcement.
Tweeting on previous day is the natural logarithm of the firm’s number of tweets on the previous trading day. Industry
tweeting represents the natural logarithm of the average number of tweets per firm in the same industry on a given
day. ***, **, * denote statistical significance at the 1% 5%, and 10% levels respectively. Standard errors are reported
in parentheses.
Number of tweets
about firm
Firm’s Return t-1 0.02***
(0.001)
Market return t-1 0.01
(0.03)
VIXt-1 0.01
(0.01)
Earnings Day
Week Before Earnings
Week After Earnings
Tweeting on previous day
Industry tweeting
Adjusted R2
0.83***
(0.03)
0.04*
(0.02)
0.36***
(0.02)
0.30***
(0.01)
0.49***
(0.03)
0.64
N 459620
37
Table 7
Returns, tweeting volume and Social Negativity Index (SNI)
This table documents the results of the panel regression of returns (in basis points) on the Social Negativity Index
(SNI), and tweeting volume about a given firm. Social Negativity Index (SNI) is the proportion of tweets with negative
sentiment about a firm on a given day. Tweeting volume is the natural logarithm of the number of tweets about a firm
on a given day. Control variables used but not shown in the table are: Lag_return is the firm’s previous day’s return.
Earnings day is the day of the firm’s earnings announcement; Week before earnings is the week prior to the firm’s
earnings announcement. Week after earnings is the week after the firm’s earnings announcement. Models 1, 2, and 3
also include the volatility index (VIX), market return on the previous day, and day of the week fixed effects. Firm
fixed effects are used in all the models, and models 4-6 include day fixed effects. Panel A shows the results for the
full sample while Panel B shows the result for the sample excluding earnings season. Standard errors, in parentheses,
are clustered by firm and day. ***, **, * denote statistical significance at the 1% 5%, and 10% levels respectively.
Panel A: Full sample
Panel B: Sample excluding earnings period
(1) (2) (3) (4) (5) (6)
Returns (basis points)
Social Negativity Index (SNI)
-35.52***
-----
-36.86***
-36.72***
-----
-38.62***
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(2.96)
-----
Included
No
Included
25.60***
(1.42)
Included
No
Included
(3.11)
25.70***
(1.42)
Included
No
Included
(2.56)
-----
Included
Included
Included
26.18***
(1.24)
Included
Included
Included
(2.72)
26.33***
(1.24)
Included
Included
Included
R2 0.055 0.060 0.061 0.070 0.075 0.075
N 461632 461632 461632 461632 461632 461632
(1) (2) (3) (4) (5) (6)
Returns (basis points)
Social Negativity Index (SNI)
-27.98***
-----
-29.28***
-29.68***
-----
-31.60***
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(3.02)
-----
Included
No
Included
25.00***
(1.50)
Included
No
Included
(3.13)
25.08***
(1.50)
Included
No
Included
(2.67)
-----
Included
Included
Included
26.36***
(1.33)
Included
Included
Included
(2.77)
26.49***
(1.33)
Included
Included
Included
R2 0.063 0.068 0.068 0.079 0.084 0.085
N 377064 377064 377064 377064 377064 377064
38
Table 8
Trading volume, tweeting volume and Social Negativity Index (SNI)
This table documents the results of the panel regression of trading volume, defined as the natural logarithm of the
number of shares traded, on the Social Negativity Index (SNI), and tweeting volume about a given firm. Tweeting
volume is the natural logarithm of the number of tweets about a firm on a given day. Social Negativity Index (SNI) is
the proportion of tweets with negative sentiment about a firm on a given day. Tweeting volume is the natural logarithm
of the number of tweets about a firm on a given day. Control variables used but not shown in the table are: Lag_return,
which is the firm’s previous day’s return, Lag trading volume, which is the trading volume of the firm’s stocks on the
previous day. Earnings day is the day of the firm’s earnings announcement; Week before earnings is the week prior
to the firm’s earnings announcement. Week after earnings is the week after the firm’s earnings announcement. Models
1, 2, and 3 also include the volatility index (VIX), market return on the previous day, and day of the week fixed effects.
Firm fixed effects are used in all the models, and models 4-6 include day fixed effects. Panel A shows the results for
the full sample while Panel B shows the result for the sample excluding earnings season. Standard errors, in
parentheses, are clustered by firm and day. ***, **, * denote statistical significance at the 1% 5%, and 10% levels
respectively.
Panel A: Full sample
Panel B: Sample excluding earnings period
(1) (2) (3) (4) (5) (6)
Trading volume
Social Negativity Index (SNI)
0.05***
-----
0.04***
0.05***
-----
0.04***
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(0.01)
-----
Included
No
Included
0.19***
(0.01)
Included
No
Included
(0.01)
0.19***
(0.01)
Included
No
Included
(0.01)
-----
Included
Included
Included
0.21***
(0.01)
Included
Included
Included
(0.01)
0.21***
(0.01)
Included
Included
Included
R2 0.90 0.91 0.91 0.91 0.91 0.91
N 460598 460598 460598 460598 460598 460598
(1) (2) (3) (4) (5) (6)
Trading volume
Social Negativity Index (SNI)
0.05***
-----
0.04***
0.05***
-----
0.04***
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(0.01)
-----
Included
No
Included
0.17***
(0.01)
Included
No
Included
(0.01)
0.17***
(0.01)
Included
No
Included
(0.01)
-----
Included
Included
Included
0.19***
(0.01)
Included
Included
Included
(0.01)
0.19***
(0.01)
Included
Included
Included
R2 0.90 0.91 0.91 0.91 0.91 0.91
N 376177 376177 376177 376177 376177 376177
39
Table 9
Returns, tweeting volume and Social Positivity
This table documents the results of the panel regression of returns (in basis points) on the Social Positivity Index, and
tweeting volume about a given firm. Social Positivity Index is the proportion of tweets with positive sentiment about
a firm on a given day. Tweeting volume is the natural logarithm of the number of tweets about a firm on a given day.
Control variables used but not shown in the table are: Lag_return is the firm’s previous day’s return. Earnings day is
the day of the firm’s earnings announcement; Week before earnings is the week prior to the firm’s earnings
announcement. Week after earnings is the week after the firm’s earnings announcement. Models 1, 2, and 3 also
include the volatility index (VIX), market return on the previous day, and day of the week fixed effects. Firm fixed
effects are used in all the models, and models 4-6 include day fixed effects. Panel A shows the results for the full
sample while Panel B shows the result for the sample excluding earnings season. Standard errors, in parentheses, are
clustered by firm and day. ***, **, * denote statistical significance at the 1% 5%, and 10% levels respectively.
Panel A: Full sample
Panel B: Sample excluding earnings period
(1) (2) (3) (4)
Returns (basis points)
Social Positivity Index
3.15
7.92**
3.88*
9.45***
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(3.35)
-----
Included
No
Included
(3.43)
25.68***
(1.42)
Included
No
Included
(2.26)
-----
Included
Included
Included
(2.29)
26.29***
(1.23)
Included
Included
Included
R2 0.055 0.060 0.070 0.075
N 461632 461632 461632 461632
(1) (2) (3) (4)
Returns (basis points)
Social Positivity Index
1.52
5.40
1.42
6.27**
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(3.49)
-----
Included
No
Included
(3.65)
25.04***
(1.50)
Included
No
Included
(2.44)
-----
Included
Included
Included
(2.46)
26.43***
(1.32)
Included
Included
Included
R2 0.062 0.068 0.078 0.084
N 377064 377064 377064 377064
40
Table 10
Vector autoregression of returns and Social Negativity Index (SNI)
This table reports estimates from panel vector autoregressions: yit = αi +∑ 𝛽𝑖 ∗ 𝑦𝑖𝑡−15𝑖=1 + 𝛽6𝐸𝑥𝑜𝑔𝑖𝑡 + εit. The
coefficients are obtained using system GMM estimations. The dependent variables are returns and Social Negativity
Index (SNI). SNI is calculated as the proportion of tweets of negative sentiment about a firm on a given day relative
the body of tweets about a firm. The model focuses on the effect on returns due to a shock in Social Negativity Index
(SNI). Exogenous variables used (but not listed) are: ln_tweetCount is the natural logarithm of the number of tweets
about a firm (including five lags). Market return is the daily market return (including five lags); VIX is the volatility
index; Earnings day is the day of earnings announcement; Week before earnings and Week after earnings are the week
before and after earnings announcement. Day of the week fixed effects are also included. ***, **, * denote statistical
significance at the 1%, 5% and 10% levels respectively. Standard errors are reported in parentheses.
Social Negativity Index (SNI) Dep. variable: Returns (basis
points)
Tweeting Day t
-37.32***
(2.26)
Tweeting Dayt-1 0.55
(2.27)
Tweeting Dayt-2 0.19
(2.28)
Tweeting Dayt-3 -2.69
Tweeting Dayt-4
Tweeting Dayt-5
(2.27)
-0.94
(2.27)
0.19
(2.25)
N 459633
Internet Appendix
to the paper
Towards a Social Negativity Index: Giving Content to Financial
Tweeting
i
Table IA. 1
Returns, tweeting volume and alternate Social Negativity Index (SNI) definition
This table documents the results of the panel regression of returns (in basis points) on the Social Negativity Index
(SNI), and tweeting volume about a given firm. The definition of Social Negativity Index (SNI) in this table is defined
as [(number of negative tweets about a firm on a given day – number of positive tweets about a firm on a given
day)/Total number of tweets about a firm on a given day]. Tweeting volume is the natural logarithm of the number of
tweets about a firm on a given day. Control variables used but not shown in the table are: Lag_return is the firm’s
previous day’s return. Earnings day is the day of the firm’s earnings announcement; Week before earnings is the week
prior to the firm’s earnings announcement. Week after earnings is the week after the firm’s earnings announcement.
Models 1, 2, and 3 also include the volatility index (VIX), market return on the previous day, and day of the week
fixed effects. Firm fixed effects are used in all the models, and models 4-6 include day fixed effects. Panel A shows
the results for the full sample while Panel B shows the result for the sample excluding earnings season. Standard
errors, in parentheses, are clustered by firm and day. ***, **, * denote statistical significance at the 1% 5%, and 10%
levels respectively.
Panel A: Full sample
Panel B: Sample excluding earnings period
(1) (2) (3) (4)
Returns (basis points)
Social Negativity Index (SNI)
-12.94***
-15.24***
-13.71***
-16.72***
(alternate definition)
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(2.16)
-----
Included
No
Included
(2.26)
25.17***
(1.50)
Included
No
Included
(1.71)
-----
Included
Included
Included
(1.74)
26.62***
(1.32)
Included
Included
Included
R2 0.062 0.068 0.079 0.084
N 377064 377064 377064 377064
(1) (2) (3) (4)
Returns (basis points)
Social Negativity Index (SNI)
-17.20***
-19.91***
-18.14***
-21.45***
(alternate definition)
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(2.03)
-----
Included
No
Included
(2.12)
25.84***
(1.42)
Included
No
Included
(1.57)
-----
Included
Included
Included
(1.63)
26.51***
(1.24)
Included
Included
Included
R2 0.055 0.061 0.070 0.075
N 461632 461632 461632 461632
ii
Table IA. 2
Trading volume, tweeting volume and alternate Social Negativity Index (SNI) definition
This table documents the results of the panel regression of trading volume, defined as the natural logarithm of the
number of shares traded, on the Social Negativity Index (SNI), and tweeting volume about a given firm. Tweeting
volume is the natural logarithm of the number of tweets about a firm on a given day. The definition of Social Negativity
Index (SNI) in this table is defined as [(number of negative tweets about a firm on a given day – number of positive
tweets about a firm on a given day)/Total number of tweets about a firm on a given day]. Tweeting volume is the
natural logarithm of the number of tweets about a firm on a given day. Control variables used but not shown in the
table are: Lag_return, which is the firm’s previous day’s return, Lag trading volume, which is the trading volume of
the firm’s stocks on the previous day. Earnings day is the day of the firm’s earnings announcement; Week before
earnings is the week prior to the firm’s earnings announcement. Week after earnings is the week after the firm’s
earnings announcement. Models 1, 2, and 3 also include the volatility index (VIX), market return on the previous day,
and day of the week fixed effects. Firm fixed effects are used in all the models, and models 4-6 include day fixed
effects. Panel A shows the results for the full sample while Panel B shows the result for the sample excluding earnings
season. Standard errors, in parentheses, are clustered by firm and day. ***, **, * denote statistical significance at the
1% 5%, and 10% levels respectively.
Panel A: Full sample
Panel B: Sample excluding earnings period
(1) (2) (3) (4)
Trading volume
Social Negativity Index (SNI)
0.05***
0.04***
0.05***
0.03***
(alternate definition)
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(0.01)
-----
Included
No
Included
(0.01)
0.19***
(0.01)
Included
No
Included
(0.01)
-----
Included
Included
Included
(0.004)
0.21***
(0.01)
Included
Included
Included
R2 0.90 0.91 0.91 0.91
N 460598 460598 460598 460598
(1) (2) (3) (4)
Trading volume
Social Negativity Index (SNI)
0.05***
0.04***
0.05***
0.03***
(alternate definition)
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(0.01)
-----
Included
No
Included
(0.01)
0.17***
(0.01)
Included
No
Included
(0.01)
-----
Included
Included
Included
(0.005)
0.19***
(0.01)
Included
Included
Included
R2 0.90 0.91 0.91 0.91
N 376177 376177 376177 376177
iii
Table IA. 3
Returns, tweeting volume using Harvard Psychological Dictionary
This table documents the results of the panel regression of returns (in basis points) on the Social Negativity Index
(SNI), and tweeting volume about a given firm. The definition of Social Negativity Index (SNI) in this table is defined
as [(number of negative tweets about a firm on a given day – number of positive tweets about a firm on a given
day)/Total number of tweets about a firm on a given day]. Negative tweets are defined using the Harvard Psychological
Dictionary rather than the Loughran and McDonald Dictionary. Tweeting volume is the natural logarithm of the
number of tweets about a firm on a given day. Control variables used but not shown in the table are: Lag_return is the
firm’s previous day’s return. Earnings day is the day of the firm’s earnings announcement; Week before earnings is
the week prior to the firm’s earnings announcement. Week after earnings is the week after the firm’s earnings
announcement. Models 1, 2, and 3 also include the volatility index (VIX), market return on the previous day, and day
of the week fixed effects. Firm fixed effects are used in all the models, and models 4-6 include day fixed effects.
Panel A shows the results for the full sample while Panel B shows the result for the sample excluding earnings season.
Standard errors, in parentheses, are clustered by firm and day. ***, **, * denote statistical significance at the 1% 5%,
and 10% levels respectively.
Panel A: Full sample
Panel B: Sample excluding earnings period
(1) (2) (3) (4)
Returns (basis points)
Social Negativity Index (SNI)
-12.94***
-15.24***
-13.71***
-16.72***
(alternate definition)
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(2.16)
-----
Included
No
Included
(2.26)
25.17***
(1.50)
Included
No
Included
(1.71)
-----
Included
Included
Included
(1.74)
26.62***
(1.32)
Included
Included
Included
R2 0.062 0.068 0.079 0.084
N 377064 377064 377064 377064
(1) (2) (3) (4)
Returns (basis points)
Social Negativity Index (SNI)
-4.59*
-6.07**
-6.63**
-7.38***
(alternate definition)
Tweeting volume
Controls
Day fixed effects
Firm fixed effects
(2.76)
-----
Included
No
Included
(2.76)
25.62***
(1.42)
Included
No
Included
(2.55)
-----
Included
Included
Included
(2.53)
26.19***
(1.24)
Included
Included
Included
R2 0.055 0.060 0.070 0.075
N 461632 461632 461632 461632