en : monitoring the global sentiments …the pre-trained language model on the labeled data....

14
S EN W AVE :MONITORING THE G LOBAL S ENTIMENTS UNDER THE COVID-19 PANDEMIC APREPRINT Qiang Yang a , Hind Alamro a , Somayah Albaradei a , Adil Salhi a , Xiaoting Lv b , Changsheng Ma a , Manal Alshehri a , Inji Jaber c , Faroug Tifratene a , Wei Wang a,b , Takashi Gojobori a , Carlos M. Duarte d , Xin Gao a and Xiangliang Zhang a a Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division and Computational Biosciences Research Center (CBRC), King Abdullah University of Science and Technology (KAUST) Email: {qiang.yang, hind.alamro, somayah.albaradei, adil.salhi, changsheng.ma, manal.alshehri, faroug.tifratene, wei.wang, takashi.gojobori, xin.gao, xiangliang.zhang}@kaust.edu.sa b School of Computer Science and Information Technology, Beijing Jiaotong University, China Email: [email protected] c IT department, King Abdullah University of Science and Technology (KAUST) Email: [email protected] d Red Sea Research Center (RSRC) and Computational Biosciences Research Center (CBRC), King Abdullah University of Science and Technology (KAUST) Email: [email protected] June 22, 2020 ABSTRACT Since the first alert launched by the World Health Organization (5 January, 2020), COVID-19 has been spreading out to over 180 countries and territories. As of June 18, 2020, in total, there are now over 8,400,000 cases and over 450,000 related deaths. This causes massive losses in the economy and jobs globally and confining about 58% of the global population. In this paper, we introduce SenWave, a novel sentimental analysis work using 105+ million collected tweets and Weibo messages to evaluate the global rise and falls of sentiments during the COVID-19 pandemic. To make a fine-grained analysis on the feeling when we face this global health crisis, we annotate 10K tweets in English and 10K tweets in Arabic in 10 categories, including optimistic, thankful, empathetic, pessimistic, anxious, sad, annoyed, denial, official report, and joking. We then utilize an integrated transformer framework, called simpletransformer, to conduct multi-label sentimental classification by fine-tuning the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, we also translate the annotated English tweets into different languages (Spanish, Italian, and French) to generated training data for building sentiment analysis models for these languages. SenWave thus reveals the sentiment of global conversation in six different languages on COVID-19 (covering English, Spanish, French, Italian, Arabic and Chinese), followed the spread of the epidemic. The conversation showed a remarkably similar pattern of rapid rise and slow decline over time across all nations, as well as on special topics like the herd immunity strategies, to which the global conversation reacts strongly negatively. Overall, SenWave shows that optimistic and positive sentiments increased over time, foretelling a desire to seek, together, a reset for an improved COVID-19 world. Keywords Covid-19 · Pandemic · Sentimental analysis · Tweets · Weibo · Fine-grained sentiment annotation 1 Introduction Since the outbreak of coronavirus, it has affected more than 180 countries where massive losses in the economy and jobs globally and confining about 58% of the global population are caused. Many people have been forced to work or study from home under pandemic. The research on people’s feelings is essential for keeping mental health and informed about Covid-19. Social medias (e.g., Twitter, Weibo) have played a major role in expressing people’s feelings arXiv:2006.10842v1 [cs.SI] 18 Jun 2020

Upload: others

Post on 10-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

SENWAVE: MONITORING THE GLOBAL SENTIMENTS UNDERTHE COVID-19 PANDEMIC

A PREPRINT

Qiang Yanga, Hind Alamroa, Somayah Albaradeia, Adil Salhia, Xiaoting Lvb, Changsheng Maa, Manal Alshehria, InjiJaberc, Faroug Tifratenea, Wei Wanga,b, Takashi Gojoboria, Carlos M. Duarted, Xin Gaoa and Xiangliang Zhang�a

aComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division and Computational Biosciences ResearchCenter (CBRC), King Abdullah University of Science and Technology (KAUST)

Email: {qiang.yang, hind.alamro, somayah.albaradei, adil.salhi, changsheng.ma, manal.alshehri, faroug.tifratene, wei.wang,takashi.gojobori, xin.gao, xiangliang.zhang}@kaust.edu.sa

bSchool of Computer Science and Information Technology, Beijing Jiaotong University, ChinaEmail: [email protected]

cIT department, King Abdullah University of Science and Technology (KAUST)Email: [email protected]

dRed Sea Research Center (RSRC) and Computational Biosciences Research Center (CBRC), King Abdullah University of Scienceand Technology (KAUST)

Email: [email protected]

June 22, 2020

ABSTRACT

Since the first alert launched by the World Health Organization (5 January, 2020), COVID-19 has beenspreading out to over 180 countries and territories. As of June 18, 2020, in total, there are now over8,400,000 cases and over 450,000 related deaths. This causes massive losses in the economy and jobsglobally and confining about 58% of the global population. In this paper, we introduce SenWave, anovel sentimental analysis work using 105+ million collected tweets and Weibo messages to evaluatethe global rise and falls of sentiments during the COVID-19 pandemic. To make a fine-grainedanalysis on the feeling when we face this global health crisis, we annotate 10K tweets in Englishand 10K tweets in Arabic in 10 categories, including optimistic, thankful, empathetic, pessimistic,anxious, sad, annoyed, denial, official report, and joking. We then utilize an integrated transformerframework, called simpletransformer, to conduct multi-label sentimental classification by fine-tuningthe pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis,we also translate the annotated English tweets into different languages (Spanish, Italian, and French)to generated training data for building sentiment analysis models for these languages. SenWavethus reveals the sentiment of global conversation in six different languages on COVID-19 (coveringEnglish, Spanish, French, Italian, Arabic and Chinese), followed the spread of the epidemic. Theconversation showed a remarkably similar pattern of rapid rise and slow decline over time across allnations, as well as on special topics like the herd immunity strategies, to which the global conversationreacts strongly negatively. Overall, SenWave shows that optimistic and positive sentiments increasedover time, foretelling a desire to seek, together, a reset for an improved COVID-19 world.

Keywords Covid-19 · Pandemic · Sentimental analysis · Tweets ·Weibo · Fine-grained sentiment annotation

1 Introduction

Since the outbreak of coronavirus, it has affected more than 180 countries where massive losses in the economy andjobs globally and confining about 58% of the global population are caused. Many people have been forced to workor study from home under pandemic. The research on people’s feelings is essential for keeping mental health andinformed about Covid-19. Social medias (e.g., Twitter, Weibo) have played a major role in expressing people’s feelings

arX

iv:2

006.

1084

2v1

[cs

.SI]

18

Jun

2020

Page 2: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

and attitudes towards Covid-19. We thus target on building a system named SenWave to monitor the global sentimentsunder the COVID-19 Pandemic by deep learning powered sentiment analysis.

Sentiment analysis has been widely researched in the field of natural language processing [1, 2, 3, 4]. Most of thecurrent sentimental analysis tasks usually consider the coarse-grained emotion labels like positive, neutral, and negativefor reviews/comments of books/products/movies, or five values to indicate the degree of emotions with the rankingscore from 1 to 5. However, the feelings of people in pandemic are much more complicated than the sentiments inmovie reviews and product comments ect. For instance, people may negatively feel angry and sad since Covid-19 leadsto the increasing number of deaths and unemployment, while others may feel optimistic because of the medical suppliesand medical assistance for the people in need. Therefore, we need to define the fine-grained labels to better understandthe impact of the health crisis on sentiment.

However, there are no appropriate and sufficient annotated data to support the building of our SenWave by trainingdeep learning sentiment classifiers. One non-Covid-19 tweet sentimental analysis dataset is avaliable in [5] with thelabeled 7724 English tweets, 2863 Arabic tweets, and 4240 Spanish tweets (in total 14, 827), which is a benchmarkdataset labeled in 11 categories, anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise,and trust. Initially, we tried to use it as our training data for building a Covid-19 sentiment classifier. However, thesentimental results were not suitable for Covid-19 analysis. For example, very few tweets are in the categories joy, loveand trust. Besides, many tweets of official reports were classified into inappropriate categories as well as the tweetsmaking jokes and denying conspiracy theories. Actually, these kinds of labels are essential in expressing opinions andattitudes according to our observations.

Due to the lack of appropriate training data, the Covid-19 tweet sentiment analysis has been conducted mostly based onengineered features or conventional bag-of-words derived representations not touching the deep learning models, eitherin unsupervised or supervised ways but with limited training data (e.g., only 5K in [6]). Yasin et al. built a real-timeCovid-19 tweets analyzer using LDA model in USA data on positive, neutral, and negative [7]. Jia et al. used LDAmodel and NRC Lexicon on the English tweets to predict one of emotions from (anger, anticipation, fear, surprise,sadness, joy, disgust and trust) [8]. Mohammed et al. employed naïve Bayes model to predict Saudis’ attitudes towardsCovid-19 preventive measures on the (postive, negative, or neutral). Caleb et al. used logistic regression classifierwith linguistic features, hashtags and tweet embedding to identify anti-Asian hate and counterhate text [9]. However,these methods are either limited by the sentimental dictionary and its availability or in lack of the deep understandingof tweets semantics. Even if the supervised methods are used like naïve Bayes, they still cannot satisfy the real case(multi-class classification) since emotions in Covid-19 can be a mixture of multiple emotions (multi-label classification).

In order to solve above problems, on one hand, we collected over 105+ million tweets covering six different languages,including English, Spanish, French, Arabic and Italian from March 1, 2020. On the other hand, to meet the requirementof fine-grained sentiment analysis target for Covid-19, we annotated 10,000 tweets in English and 10,000 tweets inArabic in 10 categories, including optimistic, thankful, empathetic, pessimistic, anxious, sad, annoyed, denial, official,and joking. Each tweet was annotated by at least three experienced annotators in the corresponding language understrict quality control. We allowed one tweet to be annotated by more than one category, to support the multi-labelanalysis. To analyze over 1 million Chinese weibo of COVID-19 posts, we construct a set of 21,173 Weibo postslabeled in 7 sentimental categories, such as anger, disgust, fear, optimism, sadness, surprised, and gratitude. These 41Klabeled datasets and the over 106+ million unlabeled tweets and weibo posts compose our dataset studied by SenWave.More details about dataset are shown in Sec. 2. We will make available the data and the SenWave implementation inpublic for supporting other social impact analysis of Covid-19 and fine-grained sentiment analysis. To comply withTwitter’s Terms of Service, we will only publicly release the Tweet IDs for unlabeled data and limited number of tweettexts for labeled data (totally 20K) for non-commercial research use. To the best of our knowledge, this is the largestlabeled Covid-19 sentimental analysis dataset with the fine-grained labels.

We summarize our main contributions in this paper as follows:

• We constructed the so far the largest fine-grained annotated Covid-19 tweets dataset (10K for English tweetsand 10K for Arabic tweets) in 10 sentiment categories, which help to facilitate the studies of social impact ofCovid-19 and other fine-grained analysis tasks in research community.

• We share a large set of Covid-19 tweets IDs collected since Mar 1, 2020, in five languages accumulated over105+ million tweets, which will be continuously updated.• We report the usability of the labeled Covid-19 tweets by first evaluating the performance of deep learning

classifiers trained on them and then test them on the over 105 million unlabeled tweets from March 1 to May15, 2020 to monitor how the global emotions vary in concerned topics and report other interesting findings.

This is the first report of COVID-19 sentiment over 105 million tweets. The largest analyzed COVID-19 tweet datasetbefore our work is 1.8M in [8], by an unsupervised way using topic modeling and lexicon features.

2

Page 3: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

2 Dataset Construction

2.1 Data Collection

We used Twint1, an open source Twitter crawler, to collect our tweet dataset where Twint allows users to specify anumber of parameters alongside the query, such as tweet language, time period, etc. By forming requests with specifiedparameters, the resulting response was scraped into JSON documents. We used a unified query across these languages:“covid-19 OR coronavirus OR covid OR corona OR A�¤Cw� (corona in English) ”. We launched 12 instances on 24cores for downloading daily updates and historical data up to the March 1. Data rates varied slightly throughout theperiod averaging around a little over a million tweets a day. Tweets were saved as JSON documents, and pooled into ashared medium, to be pre-processed and consumed by the language models for sentiment analysis, running on a GPUserver (GTX 1080ti GPU and 20 CPUs).

Due to the limited number of tweets in Chinese that could be found on Twitter, our data collection for the sentimentanalysis of COVID-19 in China was conducted on Sina Weibo, which is the largest social media platform in China. TheWeibo records were collected by Sina weibo API, starting from collecting first the hashtags about COVID-19, and thenextracted weibo records including these hashtags.

2.2 Data Annotation

We randomly selected 10K English and 10K Arabic tweets for sentiment annotation. These two languages are selectedbecause English and Arabic are among the top-5 popular languages in the world2. In addition, English can be effectivelytranslated into other languages when needed. We thus have a considerably large labeled dataset for tweet sentimentalanalysis. The sentiment categories were determined by domain experts after reviewing a subset of the collected tweetsand discussing for several rounds. The final determined set of labels reflect the complicated sentiments in pandemic.These 10 labels and their covered auxiliary emotions are optimistic (representing hopeful, proud, trusting), thankful forthe efforts to combat the virus, empathetic (including praying), pessimistic (hopeless), anxious (scared, fearful), sad,annoyed (angry), denial towards conspiracy theories, official report, and joking (ironical).

We recruited over 50 experienced annotators to make every tweet labeled by at least three annotators. Example tweetswere provided in advance to annotators with suggested categories. We allowed each tweet to be assigned to multiplelabels, which is in line with the convention. For example, the tweet “Dear Covid19, Will you please retaliate on ourbehalf. We’re hopeless, helpless, restless, speechless.” has a mixture of pessimistic and sad emotions. We allowedthe multi-label annotation with our fine-grained sentiments to support the analysis of the complicated emotions in thepandemic. In order to measure the reliability of the sentiment annotations, we conducted a verification study on theannotated tweets where the final results of the labeled tweets is determined by the majority-voting strategy. Our labeleddataset can also be used for other events with complex emotions, e.g., public opinion analysis and general electionanalysis. The commonality is that the emotions of these events are multiple aspects.

For Chinese weibo, we analyzed the COVID-19 posts and annotated 21,173 Weibo in 7 sentimental categories, such asoptimistic, thankful, surprised, fearful, sad, angry and disgusted.

Table 1: Statics of data collected in each languageEn Es Ar Fr It Zh

max 1.7M 652K 223K 132K 63K 23Kmin 272K 90K 52K 28K 15K 0.6K

mean 898K 273K 105K 64K 35K 9.6Ktotal 68M 21M 8M 4.9M 2.7M 1.1M

Table 2: Correlations of normalized volumeEs Fr It Ar

En 0.94 0.96 0.80 0.83Es 0.96 0.82 0.84Fr 0.84 0.86It 0.79

2.3 Data Description

The processed data is saved and updated in the git repository using the fetched data. To comply with Twitter’s Terms ofService, we are only publicly releasing the Tweet IDs for unlabeled data and the limited number of Tweet texts for thelabel data. Hence, the researcher can refetch the original tweet if that tweet is still public. This dataset is licensed underthe Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0).The dataset is available on Github at https://github.com/gitdevqiang/SenWave.

1https://github.com/twintproject/twint2 https://www.vicinitas.io/blog/twitter-social-media-strategy-2018-research-100-million-tweets

3

Page 4: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

0

0.5

1

1.5

2

2.5

3

3/1/20 3/8/20 3/15/20 3/22/20 3/29/20 4/5/20 4/12/20 4/19/20 4/26/20 5/3/20 5/10/20

Millions

Collected Tweets in 5 languages

Total En Ar Es It Fr Sunday

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

01/1

0/20

01/1

2/20

01/1

4/20

01/1

6/20

01/1

8/20

01/2

0/20

01/2

2/20

01/2

4/20

01/2

6/20

01/2

8/20

01/3

0/20

02/0

1/20

02/0

3/20

02/0

5/20

02/0

7/20

02/0

9/20

02/1

1/20

02/1

3/20

02/1

5/20

02/1

7/20

02/1

9/20

02/2

1/20

02/2

3/20

02/2

5/20

02/2

7/20

02/2

9/20

03/0

2/20

03/0

4/20

03/0

6/20

03/0

8/20

03/1

0/20

03/1

2/20

03/1

4/20

03/1

6/20

03/1

8/20

03/2

0/20

03/2

2/20

03/2

4/20

03/2

6/20

03/2

8/20

03/3

0/20

04/0

1/20

04/0

3/20

04/0

5/20

04/0

7/20

04/0

9/20

04/1

1/20

04/1

3/20

04/1

5/20

04/1

7/20

04/1

9/20

04/2

1/20

04/2

3/20

04/2

5/20

04/2

7/20

04/2

9/20

05/0

1/20

05/0

3/20

05/0

5/20

05/0

7/20

05/0

9/20

05/1

1/20

05/1

3/20

05/1

5/20

Zh En Ar Es It Fr

Italy, Spain and France: national lockdown/closure of leisure areaUK: considered herd immunityUS: a national state of emergency, president was testedCanada: PM’s wife tested positive

on March 12-13 KSA: King's speech, the entry to two Holy cities was suspended

on March 21

China: Wuhan Lockdownon Jan 22

China: Wuhanunblockedon April 8

Figure 1: (Top) the absolute dailiy volume of COVID-19 Tweets from Mar 1 to May 15, 2020 collected in 5 languages,English (En), Spanish (Es), Arabic (Ar), French (Fr), and Italian (It). (Bottom) the normalized volume (normalized tomaximum number, with volume statistics in Table 1) of the 5 languages, as well as the COVID-19 Weibo in Chinese(Zh) from Jan 10 to May 15.

2.3.1 Statistics of Unlabeled Tweets

The volumes of collected daily data for each language are illustrated in Fig. 1. The statistics show a similar pattern ofrapid rise followed by gradual fall in the global citizen conversation around COVID-19, but with the peak in messagesin Chinese reaching the maximum on January 22, two months earlier than the peak in all other languages examined onMarch 12-13 and March 21, reflecting the lag between the development of the epidemic first detected in Wuhan andthe spread to reach pandemic status. The discussion and attention quickly reached the peak when important decisionswere made. For example, Italy, Spain, and France announced the national lockdown or closure of leisure areas whileUK considered the herd immunity and US announced a state of alarm on March 12-13. Meanwhile, in KSA the peakwas reached due to the King’s speech and the suspended entry to two Holy cities on March 21. In addition, people’sattention cools down as time goes on. As shown in Table 2, the fall and rise in the volume of messages on COVID-19was remarkably correlated for English, French, Italian, Arabic and Spanish languages, although the rise occurred earlierin Italian, the first western nation to suffer the epidemic. The high correlation coefficient values indicate that populationsspeaking different languages are responding in a similar way.

2.3.2 Information of Annotated Tweets

The label distributions are shown in Table 3. Note that the sum of the percentages is not 1 due to the multi-labelannotation in En and Ar. In English, joking and annoyed emotions took large portions, which is consistent with the thereality, since Covid-19 causes deaths, high unemployment rates and other problems. However, we also see optimisticemotion is the third largest category, indicating that people feel confident about combating the virus and about the

4

Page 5: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

future. In Arabic, official is significantly higher than others, because since the outbreak of Covid-19, most of the Arabicgovernments announced a lot of decisions regarding different situations on Twitter.

Table 3: The label distributions in annotated English (En) and Arabic (Ar) datasets (%)Opti. Than. Empa. Pess. Anxi. Sad Anno. Deni. Offi. Joki.

En 23.73 4.98 3.89 13.25 16.95 21.33 34.92 6.31 12.07 44.76Ar 11.27 3.33 6.49 4.65 7.53 10.8 17.17 2.1 34.52 14.18

Optimistic Thankful Surprised Fearful Sad Angry DisgustedZh 16.98 14.51 14.85 15.24 13.02 15.00 10.40

English tweet examples of each category are shown in Table 4 where some tweets have more than one label, even withthree labels. Based on the statistics of categories, we find that more than 70% of English tweets were assigned withmore than one label, while about 20% of Arabic tweets were assigned with more than one labels. We also presentthe relations between these labels in Fig. 2. While the label co-occurrence in Arabic shows as three blocks (positive,negative and neural), the label co-occurrence in English is more complicated. These observations imply that themulti-label classification in our English dataset is more challenging than that in Arabic. We also illustrate this fact inthe experiments.

Optim

istic

Than

kful

Empa

thet

ic

Pess

imist

ic

Anxi

ous

Sad

Anno

yed

Deni

al

Offic

ial r

epor

t

Joki

ng

Optimistic

Thankful

Empathetic

Pessimistic

Anxious

Sad

Annoyed

Denial

Official report

Joking

2373 235 171 226 246 291 379 72 156 982

235 498 28 15 41 29 67 14 70 92

171 28 389 18 50 71 41 7 7 63

226 15 18 1325 268 272 420 90 62 554

246 41 50 268 1695 360 452 95 138 510

291 29 71 272 360 2133 723 54 186 747

379 67 41 420 452 723 3492 261 122 1235

72 14 7 90 95 54 261 631 51 184

156 70 7 62 138 186 122 51 1207 95

982 92 63 554 510 747 1235 184 95 4476500

1000

1500

2000

2500

3000

3500

4000

(a) English tweets

Optim

istic

Than

kful

Empa

thet

ic

Pess

imist

ic

Anxi

ous

Sad

Anno

yed

Deni

al

Offic

ial r

epor

t

Joki

ng

Optimistic

Thankful

Empathetic

Pessimistic

Anxious

Sad

Annoyed

Denial

Official report

Joking

1127 199 366 0 0 0 0 0 3 0

199 333 40 0 0 0 0 0 0 0

366 40 694 0 0 0 0 0 0 0

0 0 0 465 124 163 204 17 0 0

0 0 0 124 753 250 215 24 0 0

0 0 0 163 250 1079 484 28 0 0

0 0 0 204 215 484 1717 125 0 0

0 0 0 17 24 28 125 210 0 0

3 0 0 0 0 0 0 0 3452 0

0 0 0 0 0 0 0 0 0 14180

500

1000

1500

2000

2500

3000

(b) Arabic tweets

Figure 2: Heatmaps of co-occurrence of labels for English and Arabic tweets.

Table 4: English examples of each category

Category ExampleSingle label

Optimistic Nothing last forever, Corona Virus will Vanish this month. “Happy New Month”Thankful Gratitude to those who are involved to safeguard our lives from fatal Corona virus. Thanks to them.

#LetUsPrayForCoronaFightersEmpathetic Allah ap ko bhi safa ata fermain. Ameen. Be strong. IA Allah will give full and speedy recovery.

#coronavirusPessimistic things won’t go back to normal #COVID19 #coronavirus #pandemic #MITAnxious I don’t feel good and I don’t know if I’m just exhausted from working so much or if I have coronaSad When someone you know.. apart of your family dies from the Coronavirus it’s shocking; unexplainable.

My whole day has been down.Annoyed Stop asking to change location man hat how you will spread corona. FooookDenial Unpopular and Insensitive Thought... Corona and Quarantine is a marketing campaign by OTT plat-

forms...!!Official report Now schools in Ontario won’t be open until May due to the Coronavirus which might post-pone the

2019-2020 school year to July or August.Joking Calling Corona Virus “rona” like she the nastiest little girl in the 5th grade. #coronavirusmemes #5G

Multiple labelsEmpathetic, Sad So heart breaking any way you see it Prayers to all the families affected by the Covid-19.Pessimistic, Joking if i get curved ima go somewhere packed to give myself coronavirusAnxious, Pessimistic Does everyone realize we’re going to reach a million cases of this coronavirus by the weekend?Denial, Sad, Annoyed Why is it that no one ever reports on the number of people who recovered from Coronavirus?Joking, Annoyed My uncle paranoid about corona virus but still goes to work ....... pick one

5

Page 6: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

3 Model Introduction

3.1 Data Preprocessing

We pre-processed the raw data to ensure the analysis quality. In details, we first remove the @users, and URLs from thetweet because they do not contribute to the tweet analysis. Then, we remove emojis and emoticons like ¨ though theycan express emotions well since we focused on the analysis textual data. Next, we filtered out noisy symbols and texts,which cannot convey meaningful semantic or lexicon information, and may even hinder the model from learning, suchas retweet symbol “RT” and some special symbols including line break, tabs and redundant blank characters. Unlikeprevious methods which also removed hashtags in tweets, we kept these hashtags since they have meaningful semantics,like “Proud to be one of the few people who hasn’t texted their ex #Covid-19 #Quarantine #lockdown”. Apart from that,we also conducted word tokenization, steaming and tagging with the NLTK tool (https://www.nltk.org/) for English,Spanish, French and Italian, and with Pyarabic for Arabic (https://github.com/linuxscout/pyarabic). We used Jieba forChinese weibo segmentation (https://github.com/fxsjy/jieba).

3.2 Multi-label Sentiment Classifiers

We built our multi-label sentiment classifiers based on deep neural network language models due to their success ondiverse NLP tasks. An integration framework called simpletransformer (https://simpletransformers.ai/) supports thefine-tuning of these pre-trained models and the training of a customized classifier. We used XLNet [10] for English,AraBert [11] for Arabic, and ERNIE [12] for Chinese (selected due to the better performance of ERNIE than that ofBert [13] and LSTM).

We first evaluated the performance of the sentiment classifier in English and Arabic language on the 10K annotatedtweets by 5-fold cross validation, except Chinese on 21K labeled Weibo. Then all 10K labeled tweets were used to trainthe final sentiment classifiers, except the Chinese sentiment classifier that was trained with 21K Chinese Weibo posts.The trained model was then used for predicting the sentiments of millions of Covid-19 tweets (Mar 1 - May 15, 2020for Non-Chinese data and January 20 - May 15, 2020 for Chinese Weibo message) for our analysis.

Considering that the translation between English and Spanish, French, Italian has been well developed, we translatedthe labeled English tweets into Spanish, French, Italian with Google translate (https://translate.google.com/) to illustratewhether our classifiers can work well . We manually checked a subset of translated tweets and were surprised by thehigh quality of translation. We used Bert [13] for Spanish, French and Italian tweets representation learning, and thenfollow the same steps for sentiment analysis.

3.3 Experimental Setting and Evaluation Metrics

We ran the experiments on a workstation with one GeForce GTX 1080 Ti with memory size 11178MB. The batch sizeis 16, and the learning rate is 4e − 5 with 20 epochs. We used multi-label accuracy (Jaccard index), F1-macro, andF1-micro, as well as the weak accuracy as the performance metrics. The accuracy with Jaccard index is defined as:

Jac.Acc =1

|D|

D∑i=1

Yi ∩ YiYi ∪ Yi

where Yi is the ground truth labels for the i-th testing sample, and Yi is the predicted labels. And the weak accuracy ofmulti-label classification is defined as:

Acc =1

D ∗m

D∑i=1

m∑j=1

σ(yij == yij)

where σ(yij == yij) checks if the predicted yij is the same as the ground truth yij , which can be 1 meaning the i-thtesting sample has a j-th label, and can be 0 indicating the i-th testing sample doesn’t have j-th label. The total numberof corrected prediction of yij is averaged with D by m, which are the number of testing samples and the number oflabels, respectively. In addition, we used the ranking average precision score (LRAP) and Hamming loss, which arespecified for multi-label classification.

6

Page 7: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

4 Results and Analysis

4.1 Performance of Classifier

We present the 5-fold cross validation results of our multi-label classifiers on SenWave in Table 5. Our classifiers reachabove 80% weak accuracy values, which prove the effectiveness of our models. The multi-label Jaccard accuracy ofEnglish and Arabic data is larger/equal than/to 0.5. However, the accuracy of Spanish, French and Italian tweets arenot better than the original data. The reasons can be two-folds: 1) the usage of different pre-trained language models:XLNet used for English tweets and AraBert used for Arabic tweets perform better than Bert generally used for Spanish,French and Italian on the same conditions [10, 11]; 2) the difficulty of classifying multi-label English tweets due to thecomplex multi-labels (shown in Fig. 2 (a)). We are working on the improvement of these models. It is worth noting thatF1 values are around 0.5 due to the class imbalance issue, which will be resolved in our future work. However, thehigh accuracy, LRAP and low Hamming loss demonstrate that the trained classifiers are usable for practical usage. Themulti-class classification accuracy of Chinese weibo shown in Table 5 helped us select ERNIE (with accuracy 0.88) forthe final analysis due to its better performance than Bert (with accuracy 0.83) and LSTM (with accuracy 0.78).

Table 5: Validation of the SenWave on the labeled records

Acc Jac.Acc F1-Macro F1-Micro LRAP Hamming LossEnglish 0.847± 0.004 0.495± 0.008 0.517± 0.012 0.573± 0.008 0.745± 0.007 0.153± 0.004Arabic 0.905± 0.002 0.589± 0.010 0.520± 0.018 0.630± 0.009 0.661± 0.009 0.111± 0.002Spanish 0.823± 0.001 0.428± 0.004 0.434± 0.010 0.511± 0.003 0.684± 0.002 0.177± 0.001French 0.824± 0.004 0.431± 0.011 0.431± 0.0116 0.510± 0.011 0.687± 0.006 0.176± 0.004Italian 0.827± 0.002 0.438± 0.006 0.441± 0.011 0.516± 0.005 0.693± 0.005 0.173± 0.002

Chinese 0.88± 0.001 (ERNIE Acc)

4.2 Sentiments Variation in Six Languages Over Days

We present the sentiments variation of 6 languages from March 1 to May 15, 2020 for Non-Chinese data and fromJanuary 20 to May 15, 2020 for Chinese Weibo message in Fig. 3. The statistics of these sentiment results are given inTable 6.

The sentiment results of English tweets are shown in Fig. 3 (a). All the positive emotions, including optimistic, thankfuland empathetic, showed the similar trend of first rising up and then falling down. It implies that people first felt positivedue to the various decisions made for combating the virus staring from the mid of March. However, the emotions wentdown in late April when a large number of people got infected. Among negative emotions, anxious and joking fell downwith the slope −0.0004 and −0.0007 respectively as time went on, while the others went stable with slight changes.The anxious may be reduced by the increasing of medical supplies and the fact that people have known much betterabout Covid-19 than before and got used to the ways living with Covid-19. However, the resulted high unemploymentrate and the high number of death may be the reason that sad and annoyed have been staying high.

The results of Arabic tweets shown in Fig. 3 (b) demonstrate significant variations in all categories of emotions. Inparticular, optimistic has been rising up, and anxious, denial and joking are falling down. The sad emotion keeps risingup due to the increasing number of new cases in several Arabic-speaking populations, such as Saudi Arabia, Qatar andUnited Arab Emirates (UAE).

The rise of optimistic and thankful and the fall of pessimistic and annoyed were also observed in Fig. 3 (c) of Spanishtweets. The similar trend of increase in thankful is observed in French tweets, as shown in Fig. 3 (d). However, theother emotions went stable, except the decline of joking and the sudden increase of denial to the conspiracy theory oflab source of corona-virus. Italian tweets also showed weak increase or decrease trends in most of the emotions, asshown in Fig. 3 (e), except those in thankful and empathetic.

The Chinese Weibo sentiments show strong variations, but no obvious trend of increase and decrease, as shown in Fig. 3(f). The most significant decrease in fearful is observed in the very beginning on Jan 20, 2020, when human-to-humantransmission was confirmed on that day. The fearful state continued until January 22, when Wuhan was locked, and thearrival of Chinese New Year. The significant jumping up of sad on April 4 due to the nationwide memorial for victimsof Covid-19.

4.3 Sentiments Variation of Selected Areas Over Days

We selected some countries and areas to illustrate how the sentiments vary over days including USA, Washington D.C. ,UK, Spain, Argentina and Saudi Arabia in Fig. 4. The statistics of these sentiment results are given in Table 7.

7

Page 8: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

y = 0.0002x - 6.6752

00.020.040.060.08

0.10.120.140.160.18

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Optimisticy = 0.0002x - 9.9211

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Thankful

y = -5E-06x + 0.2134

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Empathetic

y = -1E-05x + 0.5293

0

0.01

0.02

0.03

0.04

0.05

0.06

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Pessimistic

y = -0.0004x + 17.745

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Anxious

y = 8E-05x - 3.4356

0

0.01

0.02

0.03

0.04

0.05

0.06

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Denialy = -7E-05x + 3.113

00.020.040.060.08

0.10.120.140.160.18

0.2

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Annoyedy = 5E-05x - 2.0694

00.010.020.030.040.050.060.070.080.09

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Sad

y = -0.0007x + 29.241

00.020.040.060.08

0.10.120.140.160.18

0.2

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Joking

(a) English

y = 0.0004x - 18.96

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Sady = -7E-05x + 3.0992

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Annoyed

y = -0.0003x + 15.339

00.005

0.010.015

0.020.025

0.030.035

0.040.045

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Denial

y = -0.0019x + 81.827

00.020.040.060.08

0.10.120.140.160.18

0.2

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Anxiousy = 0.0001x - 4.7747

0

0.005

0.01

0.015

0.02

0.025

0.03

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Pessimistic

y = -0.0006x + 26.209

00.020.040.060.08

0.10.120.140.160.18

0.2

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Joking

y = 0.0011x - 46.413

0

0.05

0.1

0.15

0.2

0.25

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Optimistic

y = 3E-05x - 1.2095

00.010.020.030.040.050.060.070.080.09

0.1

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Thankful

y = -0.0005x + 21.85

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Empathetic

(b) Arabic

y = 0.0001x - 4.8061

0

0.02

0.04

0.06

0.08

0.1

0.12

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Sady = -0.0002x + 7.8015

00.020.040.060.08

0.10.120.140.160.18

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Annoyed

y = -4E-05x + 1.6796

0

0.005

0.01

0.015

0.02

0.025

0.03

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Denial

y = -4E-05x + 1.9926

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Anxious

y = -0.0002x + 9.5154

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Pessimistic

y = -0.0011x + 48.516

0

0.05

0.1

0.15

0.2

0.25

0.3

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Joking

y = 0.0004x - 16.986

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Optimistic

y = 0.0001x - 6.40730

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Thankful

y = 8E-06x - 0.3604

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Empathetic

(c) Spanish

y = 0.0001x - 6.4247

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Optimistic

y = 8E-05x - 3.395

0

0.005

0.01

0.015

0.02

0.025

0.03

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Thankful

y = -8E-06x + 0.3641

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Empathetic

y = 7E-05x - 2.7785

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Anxiousy = 5E-05x - 2.2843

00.010.020.030.040.050.060.070.080.09

0.1

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Pessimistic

y = -0.0009x + 38.816

0

0.05

0.1

0.15

0.2

0.25

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Joking

y = -8E-05x + 3.6339

0

0.02

0.04

0.06

0.08

0.1

0.12

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Sady = -0.0004x + 15.571

00.020.040.060.08

0.10.120.140.160.18

0.2

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Annoyed

y = -1E-07x + 0.0175

0

0.005

0.01

0.015

0.02

0.025

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Denial

(d) French

y = 0.0002x - 6.9736

00.010.020.030.040.050.060.070.080.09

0.1

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Sad

y = -0.0002x + 7.8678

00.020.040.060.08

0.10.120.140.160.18

0.2

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Annoyed

y = -4E-05x + 1.6989

00.0020.0040.0060.008

0.010.0120.0140.0160.018

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Denial

y = 8E-06x - 0.2521

0

0.02

0.04

0.06

0.08

0.1

0.12

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Anxiousy = -6E-05x + 2.6793

0

0.01

0.02

0.03

0.04

0.05

0.06

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Pessimisticy = -0.0003x + 14.159

00.020.040.060.08

0.10.120.140.160.18

0.2

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Joking

y = 6E-05x - 2.6721

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Optimistic

y = -4E-05x + 1.9465

0

0.005

0.01

0.015

0.02

0.025

0.03

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Thankful

y = -2E-05x + 1.0749

00.0010.0020.0030.0040.0050.0060.0070.0080.009

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Empathetic

(e) Italian

y = -0.0002x + 9.8529

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1/20

/20

1/27

/20

2/3/

20

2/10

/20

2/17

/20

2/24

/20

3/2/

20

3/9/

20

3/16

/20

3/23

/20

3/30

/20

4/6/

20

4/13

/20

4/20

/20

4/27

/20

5/4/

20

5/11

/20

Fearful

y = -0.0001x + 6.2432

0

0.05

0.1

0.15

0.2

0.25

0.3

1/20

/20

1/27

/20

2/3/

20

2/10

/20

2/17

/20

2/24

/20

3/2/

20

3/9/

20

3/16

/20

3/23

/20

3/30

/20

4/6/

20

4/13

/20

4/20

/20

4/27

/20

5/4/

20

5/11

/20

Sad

y = 8E-05x - 3.4399

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1/20

/20

1/27

/20

2/3/

20

2/10

/20

2/17

/20

2/24

/20

3/2/

20

3/9/

20

3/16

/20

3/23

/20

3/30

/20

4/6/

20

4/13

/20

4/20

/20

4/27

/20

5/4/

20

5/11

/20

Angry

y = 0.0006x - 26.297

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1/20

/20

1/27

/20

2/3/

20

2/10

/20

2/17

/20

2/24

/20

3/2/

20

3/9/

20

3/16

/20

3/23

/20

3/30

/20

4/6/

20

4/13

/20

4/20

/20

4/27

/20

5/4/

20

5/11

/20

Surprisedy = -0.0003x + 12.186

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1/20

/20

1/27

/20

2/3/

20

2/10

/20

2/17

/20

2/24

/20

3/2/

20

3/9/

20

3/16

/20

3/23

/20

3/30

/20

4/6/

20

4/13

/20

4/20

/20

4/27

/20

5/4/

20

5/11

/20

Optimistic

y = -0.0002x + 7.2706

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1/20

/20

1/27

/20

2/3/

20

2/10

/20

2/17

/20

2/24

/20

3/2/

20

3/9/

20

3/16

/20

3/23

/20

3/30

/20

4/6/

20

4/13

/20

4/20

/20

4/27

/20

5/4/

20

5/11

/20

Thankful

y = 0.0001x - 4.8157

0

0.05

0.1

0.15

0.2

0.25

1/20

/20

1/27

/20

2/3/

20

2/10

/20

2/17

/20

2/24

/20

3/2/

20

3/9/

20

3/16

/20

3/23

/20

3/30

/20

4/6/

20

4/13

/20

4/20

/20

4/27

/20

5/4/

20

5/11

/20

Disgusted

(f) Chinese

Figure 3: Sentiment variation of all languages over time. Each subfigure shows the results of one type of languagesincluding 9 emotions for non-Chinese tweets and 7 emotions for Chinese Weibo messages. The linear regression line isfit to each emotion curve, showing the trend of the emotion variation.

Fig. 4 (a) shows in USA, the portion of negative emotions is higher than that of positive emotions. On March 12, peoplefelt annoyed and anxious (see the pie charts) since the normal life was affected by coronavirus e.g., cancellation ofsport events and suspending of transportation. On March 21, however, the positive emotions had a slight increase whenpeople were showing gratitude for the efforts of healthcare workers. Similarly, on March 28, people were suggested tofollow the guidelines to stay coronavirus free. However, the negative emotions went up once again due to the increasingrate of death, infection and unemployment on April 11. Several days later on April 18, this figures augmented becauseUSA death toll was the highest in the world and it was still rising quickly.

Fig. 4 (b) shows in Washington D.C., on March 2, negative emotions (shown by the blue line) went to the peak due to apublic health official announcement saying that outbreak of a new coronavirus could soon become a pandemic (see

8

Page 9: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

Table 6: The statistics of daily sentiment fraction in different categories in all languages, presented as mean±std, andthe number of tweets/weibo from which the statistics were obtained.

Opti. Than. Empa. Joki. Pess. Anxi. Sad Anno. Deni.En 0.19 ± 0.02 0.06 ± 0.02 0.01 ± 0.0 0.17 ± 0.02 0.06 ± 0.0 0.14 ± 0.02 0.1 ± 0.01 0.23 ± 0.01 0.05 ± 0.01

15758721 5399197 816664 13822859 4721744 10872331 8181085 18445343 4014464Es 0.18 ± 0.02 0.03 ± 0.01 0.0 ± 0.0 0.22 ± 0.04 0.07 ± 0.01 0.14 ± 0.01 0.13 ± 0.01 0.2 ± 0.01 0.03 ± 0.0

4284423 688944 123807 5472179 1672356 3410825 3203353 4759697 671636Ar 0.17 ± 0.05 0.07 ± 0.02 0.11 ± 0.03 0.18 ± 0.02 0.03 ± 0.01 0.14 ± 0.06 0.08 ± 0.02 0.19 ± 0.02 0.02 ± 0.01

844165 392832 627644 943901 127120 794588 399315 941615 138903Fr 0.14 ± 0.02 0.02 ± 0.01 0.0 ± 0.0 0.22 ± 0.03 0.11 ± 0.01 0.14 ± 0.01 0.13 ± 0.01 0.22 ± 0.01 0.02 ± 0.0

884553 158938 17301 1351669 650069 874491 814349 1357922 108614It 0.17 ± 0.01 0.03 ± 0.01 0.01 ± 0.0 0.24 ± 0.01 0.07 ± 0.0 0.14 ± 0.01 0.11 ± 0.01 0.22 ± 0.01 0.02 ± 0.0

486093 78086 15332 683951 201141 391758 300875 604932 49141Optimistic Thankful Surprised Fearful Sad Angry Disgusted

Zh 0.21 ± 0.05 0.12 ± 0.04 0.12 ± 0.04 0.19 ± 0.07 0.09 ± 0.03 0.14 ± 0.04 0.13 ± 0.03244472 143374 124820 206767 105974 155083 143833

Table 7: The statistics of daily sentiment fraction in different categories of different countries and areas, presented asmean±std, and the number of tweets from which the statistics were obtained.

Opti. Than. Empa. Pess. Anxi. Sad Anno. Deni. Offi. Joki.USA 0.16±0.02 0.05±0.01 0.01 ± 0.0 0.04 ± 0.0 0.11±0.02 0.09±0.01 0.2 ± 0.01 0.03±0.01 0.12±0.01 0.18±0.02

271961 83907 15011 67987 179757 157241 318295 48253 184803 308588D.C. 0.17±0.03 0.07±0.02 0.01±0.01 0.04±0.01 0.1 ± 0.03 0.08±0.02 0.18±0.02 0.03±0.01 0.17±0.03 0.14±0.02

2178 816 84 551 1305 1064 2349 377 2195 1846UK 0.2 ± 0.03 0.07±0.02 0.01±0.01 0.05±0.01 0.12±0.02 0.1 ± 0.01 0.17±0.02 0.03±0.01 0.11±0.02 0.15±0.03

34834 13220 1574 7525 19233 16606 27761 3977 17611 25245Spain 0.18±0.03 0.03±0.01 0.01 ± 0.0 0.06±0.01 0.11±0.01 0.13±0.02 0.18±0.02 0.02±0.01 0.09±0.02 0.19±0.04

4354 669 145 1556 2807 3255 4560 567 2246 4620Arge. 0.15±0.03 0.02±0.01 0.0 ± 0.0 0.07±0.02 0.12±0.02 0.11±0.02 0.18±0.02 0.03±0.01 0.09±0.03 0.23±0.04

2938 304 56 1452 2381 2277 3654 504 1708 4897KSA 0.19±0.06 0.09±0.03 0.14±0.05 0.02±0.01 0.1 ± 0.06 0.06±0.01 0.12±0.03 0.02±0.01 0.16±0.06 0.11±0.03

15137 7943 18162 1269 8171 4623 6597 385 13261 6804

the pie chart at the right hand). On March 13, there was a positive spike due to the declaration of a National Day ofPrayer amid coronavirus pandemic. On March 29, negative pumped up due to the high number of death within one dayachieving 2000, and especially a significant increase of anxious because of the negligence of social distancing, evenhaving gathering. On April 13, many people showed gratitude to the medical professionals and front-line workers (seethe pie chart at the right hand) while during April 27 and 28, people expressed their dissatisfaction with the governmentdue to the resistance, as well as on May 10.

Fig. 4(c) shows in UK, on March 9, the negative emotions caused by panic buying of hand sanitizer and toilet rolland people’s fear to coronavirus and oil price war leading to the plunging of FTSE 100. After different cornoavirusmeasures were imposed, the positive sentiment went up significantly. It would be better to zoom in the figures to seeother detailed interpretation.

In Spain (Fig. 4(d)), people applauded for the health care workers treating the coronavirus on the balcony on March 15,felt angry about the extension of another 15 days of alarm and sad about the third highest number of deaths on March22 (in pie chart).

In Argentina shown in Fig. 4 (e), the proportion of negative emotions was very close to 0.5 even much higher in somedays. On March 8, the discussions about first dead case of coronavirus and dengue were focused on leading to theincrease of anxious, sad and annoyed (see pie chart at the right hand). On March 21, the feelings of stress, anxiety,panic went up because of the long quarantine, which resulted in the increase of anxious and sad. On April 29, morethan 2,300 prisoners were released because of the coronavirus, which increased the feelings of pessimistic, anxious andannoyed.

Fig. 4 (f) shows stronger positive sentiment in Saudi Arabia than in other countries or areas. Especially, starting fromMarch 13, there was an increase of positive emotions when a lot of decisions were taken by the Saudi government. Thepeak was reached on March 21, responding to a tweet by the Saudi minister of health: “We are all responsible, stayinghome is our strongest weapon against the virus”. Another positive peak was shown on April 23-24, when Ramadanstarted.

9

Page 10: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2020

-03-

0120

20-0

3-02

2020

-03-

0320

20-0

3-04

2020

-03-

0520

20-0

3-06

2020

-03-

0720

20-0

3-08

2020

-03-

0920

20-0

3-10

2020

-03-

1120

20-0

3-12

2020

-03-

1320

20-0

3-14

2020

-03-

1520

20-0

3-16

2020

-03-

1720

20-0

3-18

2020

-03-

1920

20-0

3-20

2020

-03-

2120

20-0

3-22

2020

-03-

2320

20-0

3-24

2020

-03-

2520

20-0

3-26

2020

-03-

2720

20-0

3-28

2020

-03-

2920

20-0

3-30

2020

-03-

3120

20-0

4-01

2020

-04-

0220

20-0

4-03

2020

-04-

0420

20-0

4-05

2020

-04-

0620

20-0

4-07

2020

-04-

0820

20-0

4-09

2020

-04-

1020

20-0

4-11

2020

-04-

1220

20-0

4-13

2020

-04-

1420

20-0

4-15

2020

-04-

1620

20-0

4-17

2020

-04-

1820

20-0

4-19

2020

-04-

2020

20-0

4-21

2020

-04-

2220

20-0

4-23

2020

-04-

2420

20-0

4-25

2020

-04-

2620

20-0

4-27

2020

-04-

2820

20-0

4-29

2020

-04-

3020

20-0

5-01

2020

-05-

0220

20-0

5-03

2020

-05-

0420

20-0

5-05

2020

-05-

0620

20-0

5-07

2020

-05-

0820

20-0

5-09

2020

-05-

1020

20-0

5-11

2020

-05-

1220

20-0

5-13

2020

-05-

1420

20-0

5-15

Optimistic Thankful Empathetic Pessimistic Anxious Sad Annoyed Denial Official report Joking pos-ratio neg-ratio

Optimistic13%

Thankful2%

Empathetic1%

Pessimistic5%

Anxious13%

Sad10%Annoyed

21%

Denial2%

Official report

9%

Joking24%

March 12

Optimistic16%

Thankful6%

Empathetic1%

Pessimistic4%

Anxious11%

Sad9%

Annoyed21%

Denial3%

Official report

11%

Joking18%

April 18

Cancellation of sport events, suspend of transportation

USA death toll is the highest in the world and it is still rising quickly coronavirusIncreasing number of death,

infection, unemployment

The gratitude to people fighting for the coronavirus and the calls of doing everyone's part

Following the guidelines to stay coronavirus free

(a) USA

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2020

-03-

0120

20-0

3-02

2020

-03-

0320

20-0

3-04

2020

-03-

0520

20-0

3-06

2020

-03-

0720

20-0

3-08

2020

-03-

0920

20-0

3-10

2020

-03-

1120

20-0

3-12

2020

-03-

1320

20-0

3-14

2020

-03-

1520

20-0

3-16

2020

-03-

1720

20-0

3-18

2020

-03-

1920

20-0

3-20

2020

-03-

2120

20-0

3-22

2020

-03-

2320

20-0

3-24

2020

-03-

2520

20-0

3-26

2020

-03-

2720

20-0

3-28

2020

-03-

2920

20-0

3-30

2020

-03-

3120

20-0

4-01

2020

-04-

0220

20-0

4-03

2020

-04-

0420

20-0

4-05

2020

-04-

0620

20-0

4-07

2020

-04-

0820

20-0

4-09

2020

-04-

1020

20-0

4-11

2020

-04-

1220

20-0

4-13

2020

-04-

1420

20-0

4-15

2020

-04-

1620

20-0

4-17

2020

-04-

1820

20-0

4-19

2020

-04-

2020

20-0

4-21

2020

-04-

2220

20-0

4-23

2020

-04-

2420

20-0

4-25

2020

-04-

2620

20-0

4-27

2020

-04-

2820

20-0

4-29

2020

-04-

3020

20-0

5-01

2020

-05-

0220

20-0

5-03

2020

-05-

0420

20-0

5-05

2020

-05-

0620

20-0

5-07

2020

-05-

0820

20-0

5-09

2020

-05-

1020

20-0

5-11

2020

-05-

1220

20-0

5-13

2020

-05-

1420

20-0

5-15

Optimistic Thankful Empathetic Pessimistic Anxious Sad Annoyed Denial Official report Joking pos-ratio neg-ratio

The passage of $2.2 trillion coronavirus relief package

Optimistic14%

Thankful3%

Empathetic1%

Pessimistic9%

Anxious17%

Sad5%

Annoyed16%

Denial6%

Official report

16%

Joking13%

March 2

Optimistic21%

Thankful10%

Empathetic3%

Pessimistic2%

Anxious6%Sad

11%

Annoyed16%

Denial2%

Official report

16%

Joking13%

April 13

Warnings from public health officials: outbreak of a new coronavirus could soon become a pandemic

It is unacceptable due to the high number of death one day

Dissatisfaction with the government

Gratitude to leaders due to their efforts

The thank to health care workers and the recall to stay at homePresident declares Sunday National Day

of Prayer amid coronavirus pandemic

Annoyed and anxious about the large number of infection and death

(b) Washington D.C.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2020

-03-

0120

20-0

3-02

2020

-03-

0320

20-0

3-04

2020

-03-

0520

20-0

3-06

2020

-03-

0720

20-0

3-08

2020

-03-

0920

20-0

3-10

2020

-03-

1120

20-0

3-12

2020

-03-

1320

20-0

3-14

2020

-03-

1520

20-0

3-16

2020

-03-

1720

20-0

3-18

2020

-03-

1920

20-0

3-20

2020

-03-

2120

20-0

3-22

2020

-03-

2320

20-0

3-24

2020

-03-

2520

20-0

3-26

2020

-03-

2720

20-0

3-28

2020

-03-

2920

20-0

3-30

2020

-03-

3120

20-0

4-01

2020

-04-

0220

20-0

4-03

2020

-04-

0420

20-0

4-05

2020

-04-

0620

20-0

4-07

2020

-04-

0820

20-0

4-09

2020

-04-

1020

20-0

4-11

2020

-04-

1220

20-0

4-13

2020

-04-

1420

20-0

4-15

2020

-04-

1620

20-0

4-17

2020

-04-

1820

20-0

4-19

2020

-04-

2020

20-0

4-21

2020

-04-

2220

20-0

4-23

2020

-04-

2420

20-0

4-25

2020

-04-

2620

20-0

4-27

2020

-04-

2820

20-0

4-29

2020

-04-

3020

20-0

5-01

2020

-05-

0220

20-0

5-03

2020

-05-

0420

20-0

5-05

2020

-05-

0620

20-0

5-07

2020

-05-

0820

20-0

5-09

2020

-05-

1020

20-0

5-11

2020

-05-

1220

20-0

5-13

2020

-05-

1420

20-0

5-15

Optimistic Thankful Empathetic Pessimistic Anxious Sad Annoyed Denial Official report Joking pos-ratio neg-ratio

Panic buying for hand sanitizer and toilet roll and billions wiped off value of leading companies as FTSE 100 plunges amid coronavirus fears and oil price war

The concern about the lack of available ppeand the sadness to the death persons

19 NSH workers have died

Complaints about the late action and herd immunity to the government

Petition for protecting the arts from economic devastation and appreciation to the NHS with priority test

Praying for Boris Johnson and calling to stay at home Businesses call for donations

for the NSH workers

Optimistic13%

Thankful3%

Empathetic0%

Pessimistic7%

Anxious17%

Sad10%

Annoyed19%

Denial3%

Official report

9%

Joking19%

March 9

Optimistic19%

Thankful8%

Empathetic1%

Pessimistic4%

Anxious11%Sad

12%

Annoyed18%

Denial3%

Official report

10%

Joking14%

April 11

(c) UK

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2020

-03-

0120

20-0

3-02

2020

-03-

0320

20-0

3-04

2020

-03-

0520

20-0

3-06

2020

-03-

0720

20-0

3-08

2020

-03-

0920

20-0

3-10

2020

-03-

1120

20-0

3-12

2020

-03-

1320

20-0

3-14

2020

-03-

1520

20-0

3-16

2020

-03-

1720

20-0

3-18

2020

-03-

1920

20-0

3-20

2020

-03-

2120

20-0

3-22

2020

-03-

2320

20-0

3-24

2020

-03-

2520

20-0

3-26

2020

-03-

2720

20-0

3-28

2020

-03-

2920

20-0

3-30

2020

-03-

3120

20-0

4-01

2020

-04-

0220

20-0

4-03

2020

-04-

0420

20-0

4-05

2020

-04-

0620

20-0

4-07

2020

-04-

0820

20-0

4-09

2020

-04-

1020

20-0

4-11

2020

-04-

1220

20-0

4-13

2020

-04-

1420

20-0

4-15

2020

-04-

1620

20-0

4-17

2020

-04-

1820

20-0

4-19

2020

-04-

2020

20-0

4-21

2020

-04-

2220

20-0

4-23

2020

-04-

2420

20-0

4-25

2020

-04-

2620

20-0

4-27

2020

-04-

2820

20-0

4-29

2020

-04-

3020

20-0

5-01

2020

-05-

0220

20-0

5-03

2020

-05-

0420

20-0

5-05

2020

-05-

0620

20-0

5-07

2020

-05-

0820

20-0

5-09

2020

-05-

1020

20-0

5-11

2020

-05-

1220

20-0

5-13

2020

-05-

1420

20-0

5-15

Optimistic Thankful Empathetic Pessimistic Anxious Sad Annoyed Denial Official report Joking pos-ratio neg-ratio

Optimistic15%

Thankful2%

Empathetic0%

Pessimistic7%

Anxious11%

Sad14%

Annoyed20%

Denial2%

Official report

12%

Joking17%

March 22

Optimistic23%

Thankful4%

Empathetic1%

Pessimistic6%

Anxious10%Sad

12%

Annoyed16%

Denial3%

Official report

11%

Joking14%

April 7

The anger at the extension of another 15 days' alarm and sad about the third highest number of death about coronavirus

The complaints about the infected and death number due to the government

Spain sees sharp drop in daily Covid-19 death toll

Applause for the health care workers treating the coronavirus on the balcony

Spanish deaths fall for fourth consecutive day and the slow increase of new cases and deaths People can go outside

The sadness of old people's death

(d) Spain

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2020

-03-

0120

20-0

3-02

2020

-03-

0320

20-0

3-04

2020

-03-

0520

20-0

3-06

2020

-03-

0720

20-0

3-08

2020

-03-

0920

20-0

3-10

2020

-03-

1120

20-0

3-12

2020

-03-

1320

20-0

3-14

2020

-03-

1520

20-0

3-16

2020

-03-

1720

20-0

3-18

2020

-03-

1920

20-0

3-20

2020

-03-

2120

20-0

3-22

2020

-03-

2320

20-0

3-24

2020

-03-

2520

20-0

3-26

2020

-03-

2720

20-0

3-28

2020

-03-

2920

20-0

3-30

2020

-03-

3120

20-0

4-01

2020

-04-

0220

20-0

4-03

2020

-04-

0420

20-0

4-05

2020

-04-

0620

20-0

4-07

2020

-04-

0820

20-0

4-09

2020

-04-

1020

20-0

4-11

2020

-04-

1220

20-0

4-13

2020

-04-

1420

20-0

4-15

2020

-04-

1620

20-0

4-17

2020

-04-

1820

20-0

4-19

2020

-04-

2020

20-0

4-21

2020

-04-

2220

20-0

4-23

2020

-04-

2420

20-0

4-25

2020

-04-

2620

20-0

4-27

2020

-04-

2820

20-0

4-29

2020

-04-

3020

20-0

5-01

2020

-05-

0220

20-0

5-03

2020

-05-

0420

20-0

5-05

2020

-05-

0620

20-0

5-07

2020

-05-

0820

20-0

5-09

2020

-05-

1020

20-0

5-11

2020

-05-

1220

20-0

5-13

2020

-05-

1420

20-0

5-15

Optimistic Thankful Empathetic Pessimistic Anxious Sad Annoyed Denial Official report Joking pos-ratio neg-ratio

Optimistic7%

Thankful1%

Empathetic0%

Pessimistic7%

Anxious16%

Sad17%Annoyed

20%

Denial4%

Official report

8%

Joking20%

March 8

Optimistic17%

Thankful3%

Empathetic0%

Pessimistic8%

Anxious10%

Sad10%

Annoyed17%

Denial2%

Official report

9%

Joking24%

March 20

Discussions about first dead caseof coronavirus and dengue People feel stress, anxiety,

panic for quarantine

More than 2,300 prisoners was released because of the coronavirus

Retirees should not be allowed to line up for pensions in banks

The thank to doctors, nurses and assistants in their fight against the Argentine coronavirus More protocols should be articulated to ensure

isolation conditions for the coronavirus

(e) Argentina

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%3/

1/20

203/

2/20

203/

3/20

203/

4/20

203/

5/20

203/

6/20

203/

7/20

203/

8/20

203/

9/20

203/

10/2

020

3/11

/202

03/

12/2

020

3/13

/202

03/

14/2

020

3/15

/202

03/

16/2

020

3/17

/202

03/

18/2

020

3/19

/202

03/

20/2

020

3/21

/202

03/

22/2

020

3/23

/202

03/

24/2

020

3/25

/202

03/

26/2

020

3/27

/202

03/

28/2

020

3/29

/202

03/

30/2

020

3/31

/202

04/

1/20

204/

2/20

204/

3/20

204/

4/20

204/

5/20

204/

6/20

204/

7/20

204/

8/20

204/

9/20

204/

10/2

020

4/11

/202

04/

12/2

020

4/13

/202

04/

14/2

020

4/15

/202

04/

16/2

020

4/17

/202

04/

18/2

020

4/19

/202

04/

20/2

020

4/21

/202

04/

22/2

020

4/23

/202

04/

24/2

020

4/25

/202

04/

26/2

020

4/27

/202

04/

28/2

020

4/29

/202

04/

30/2

020

5/1/

2020

5/2/

2020

5/3/

2020

5/4/

2020

5/5/

2020

5/6/

2020

5/7/

2020

5/8/

2020

5/9/

2020

5/10

/202

05/

11/2

020

5/12

/202

05/

13/2

020

5/14

/202

05/

15/2

020

Optimistic Thankful Empathetic Pessimistic Anxious Sad Annoyed Denial Official report Joking pos-ratio neg-ratio

Optimistic33%

Thankful16%Empathetic

26%

Pessimistic1%

Anxious8%

Sad2%

Annoyed5%

Denial1%

Official report3% Joking

5%

March 21

Optimistic14%

Thankful10%

Empathetic9%

Pessimistic2%

Anxious15%

Sad5%

Annoyed15%

Denial3%

Official report

12%

Joking15%

March 29

King's speech

Government support the private sector by paying 60% of citizens’ wages

Trend #Hashtag: we are all responsible

Everyone pass school

A partial lifting of the curfew

Schools suspended Jeddah locked, further curfew in Makkah

Ramadan starts

Umrah suspended

(f) Saudi Arabia

Figure 4: Sentiment variation in different regions over time. Each bar shows the distribution of sentiments on one day,where sentiments are shown in different colors. The blue curve and purple curve show the positive (sum of optimistic,thankful, empathetic in yellow at different intensity) and the negative (sum of pessimistic, anxious, sad, annoyed, denialin blue at different intensity), respectively. (Better zoom in to see the interpretation of spikes)

4.4 Sentiments Variation of Studied Topics Over Days

We give the sentiment analysis of 7 topics including stock market, oil price, herd immunity, economic stimulus,drug/medicine/vaccine, employment/job, and working from home. The results are shown in Figure 5. The statistics ofthese sentiment results are given in Table 8.

The topic stock markets collapsed on March 9, when the peak of discussion was reached. On this day, anxious reaches ahigh value, which is greater than mean+2*std (out of the black dash line, and the black line is the mean, the dot line isthe mean-2*std). On March 12, the DJI (Dow Jones Index) had its worst day since 1987, plunging about 10% (thesecond time breakers) and the volumes arrived the second largest. The anxious state remained at a high rate duringthese days. On the weekends of March 20-21 and March 28-29, the spikes of denial are higher than the blue dash line(mean+2*std), as a reflection of the continuous stock market collapse. DJI: Dow Jones Industrial Average is shown ontop of the denial curve, and the markers indicate the days of March 9, 20 and 28, when these negative sentiments arereflected on the drop of DJI, as shown in Fig.5(a).

The topic oil price also showed the peak of discussion on March 9. The drop of crude oil price resulted in significantanxious on March 9-12. However, this was not the worst. On April 21, crude oil price reached an 18-year low, whichis shown on the marked point on the WTI crude oil curve. Among the triggered discussion, we see pessimistic wassignificant.

The topic herd immunity quickly reached the top on March 14-15 when UK government initially considered it onMarch 13. Among the intensive discussions from March 13 to 17, denial and joking were significantly observed on

10

Page 11: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

0

1000

2000

3000

4000

5000

6000

7000

8000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

3/1/20 3/8/20 3/15/20 3/22/20 3/29/20 4/5/20 4/12/20 4/19/20 4/26/20 5/3/20 5/10/20

Optimistic Thankful Empathetic Joking PessimisticAnxious Sad Annoyed Denial Stock (#)

Anxious peak on Mar 9

Denial peak on weekends ofMar 20-21, 28-29

0

0.05

0.1

3/18/20

3/20/20

3/22/20

3/24/20

3/26/20

3/28/20

3/30/20

0

0.05

0.1

0.15

0.2

3/7/20

3/8/20

3/9/20

3/10/20

3/11/20

3/12/20

3/13/20

15000

17000

19000

21000

23000

25000

27000

29000

31000

2/18/20

2/25/20

3/3/20

3/10/20

3/17/20

3/24/20

3/31/20

4/7/20

4/14/20

DJI: Dow Jones Industral Average

(a) Stock market

0

1000

2000

3000

4000

5000

6000

7000

8000

0%10%20%30%40%50%60%70%80%90%100%

3/1/20 3/8/20 3/15/20 3/22/20 3/29/20 4/5/20 4/12/20 4/19/20 4/26/20 5/3/20 5/10/20

Optimistic Thankful Empathetic Joking PessimisticAnxious Sad Annoyed Denial Oil price (#)

0

0.05

0.1

0.15

0.2

0.25

3/4/20

3/6/20

3/8/20

3/10/20

3/12/20

3/14/20

3/16/20

3/18/20

0.05

0.1

0.15

0.2

0.25

4/17/20

4/18/20

4/19/20

4/20/20

4/21/20

4/22/20

4/23/20

Anxious on Mar 9-12

Pessimistic on Apr 20

0

510

15

20

2530

35

40

45

50

3/1/20

3/8/20

3/15/2

0

3/22/2

0

3/29/2

0

4/5/20

4/12/2

0

4/19/2

0

4/26/2

0

5/3/20

5/10/2

0

WTI Crude Oil

(b) Oil price

0

500

1000

1500

2000

2500

3000

3500

4000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

3/1/20 3/8/20 3/15/20 3/22/20 3/29/20 4/5/20 4/12/20 4/19/20 4/26/20 5/3/20 5/10/20

Optimistic Thankful Empathetic Joking PessimisticAnxious Sad Annoyed Denial Herd-Immu(#)

Denial andJokingpeak onMar 15-16

Denial in April 12-13

0.05

0.1

0.15

0.2

0.25

0.3

3/12/20

3/19/20

3/26/20

4/2/20

4/9/20

4/16/20

0

0.05

0.1

0.15

0.2

0.25

3/12/20

3/13/20

3/14/20

3/15/20

3/16/20

3/17/20

3/18/20

3/19/20

3/20/20

Annoyed from Mar 22 to Apr 7

(c) Herd immunity

0

2000

4000

6000

8000

10000

12000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

3/1/20 3/8/20 3/15/20 3/22/20 3/29/20 4/5/20 4/12/20 4/19/20 4/26/20 5/3/20 5/10/20

Optimistic Thankful Empathetic Joking PessimisticAnxious Sad Annoyed Denial Stimulus(#)

Denial in Mar 25

0

0.02

0.04

0.06

0.08

0.1

3/22/20

3/23/20

3/24/20

3/25/20

3/26/20

3/27/20

3/28/20

3/29/20

0.04

0.07

0.1

0.13

0.16

0.19

3/22/20

3/24/20

3/26/20

3/28/20

3/30/20

4/1/20

4/3/20

4/5/20

4/7/20

4/9/20

4/11/20

4/13/20

4/15/20

4/17/20

4/19/20

4/21/20

Increase of Joking inMar 24-30, Apr 13-18

(d) Economic stimulus

0

10000

20000

30000

40000

50000

60000

70000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

3/1/20 3/8/20 3/15/20 3/22/20 3/29/20 4/5/20 4/12/20 4/19/20 4/26/20 5/3/20 5/10/20

Optimistic Thankful Empathetic Joking PessimisticAnxious Sad Annoyed Denial Drug(#)

0.1

0.14

0.18

0.22

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

0.09

0.14

0.19

3/13/20

3/20/20

3/27/20

4/3/20

4/10/20

0.02

0.04

0.06

0.08

0.1

3/13/20

3/20/20

3/27/20

4/3/20

4/10/20

0.06

0.08

0.1

0.12

0.14

0.16

3/1/20

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

More Anxious,less Optimistic

Less Anxious,more Optimistic

Denial andAnnoyed onMar 15-16

Denial andAnnoyed on Apr6-7

Apr 6-7: Anti-Malaria drugs were hyped as unproven coronavirus treatment

Mar 15-16:Germany tries to stop U.S. poaching German firm seeking coronavirus vaccine

(e) Drug/medicine and vaccine

0

10000

20000

30000

40000

50000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

3/1/20 3/8/20 3/15/20 3/22/20 3/29/20 4/5/20 4/12/20 4/19/20 4/26/20 5/3/20 5/10/20

Optimistic Thankful Empathetic Joking PessimisticAnxious Sad Annoyed Denial Employment(#)

Hot wordsunemployment 18%

income 17%rent 16%salary 11%mortgage 8%laid off 7%no job 6%out of work 5% 0.06

0.08

0.1

0.12

5/8/20

5/9/20

5/10/20

5/11/20

5/12/20

0.1

0.14

0.18

0.22

0.26

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

0.05

0.1

0.15

0.2

0.25

3/8/20

3/15/20

3/22/20

3/29/20

4/5/20

4/12/20

4/19/20

4/26/20

5/3/20

5/10/20

Less and lessOptimisticMore and more Annoyed

Increase ofOptimisticDecrease of Annoyed

Peak of AnxiousMay 8-10, Unemployment rate rose to a record 14.7%

(f) Employment/job

010002000300040005000600070008000900010000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

3/1/20 3/8/20 3/15/20 3/22/20 3/29/20 4/5/20 4/12/20 4/19/20 4/26/20 5/3/20 5/10/20

Optimistic Thankful Empathetic Joking PessimisticAnxious Sad Annoyed Denial WFH (#)

Optimistic dropped on•Apr 18: A UK policechief was criticised ona covid-19 mass gathering

•May 10: PM calls on UK to go back to work

0.3

0.4

0.5

0.6

3/7/20

3/14/20

3/21/20

3/28/20

4/4/20

4/11/20

4/18/20

4/25/20

5/2/20

5/9/20

Optimistic slope: 0.0015

0

0.05

0.1

0.15

0.2

0.25

3/7/20

3/14/20

3/21/20

3/28/20

4/4/20

4/11/20

4/18/20

4/25/20

5/2/20

5/9/20

Anxious slope: - 0.0008

(g) Working from home

Figure 5: Sentiments variation on seven topics. We show the sentiment results for these topics when they wereintensively discussed (around the peak of volume curve in the background).

March 15-16. The discussion continued with significant annoyed from March 22 to April 7, and caused another rise ofdenial on April 12-13.

The topic economic stimulus reached top on March 26 when US Senate passed historic $2tn relief package. And anotherpeak on April 15-16 when the checks were received. Surprisingly, during the discussion in March 23-26, positive waslower compared to other days, and denial was significant on March 25. We found many tweets under this topic are forexample “this is not enough”, “US economy is tanking”, and “the pandemic is getting worse”. By looking into thejoking, we see increases in March 24-30 and April 13-18.

The topic drug/medicine/vaccine collected the largest amount of discussion among these 7 topics (reaching 20-40K onthe daily volume). This topic has been hot since the global outbreak around March 10. Two events in this topic caused

11

Page 12: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

significant denial and annoyed. The first event was on March 15-16, Germany tries to stop U.S. poaching German firmseeking coronavirus vaccine. The second event was on April 6-7, when Anti-Malaria drugs were hyped as unprovencoronavirus treatment. Overall from March to May, we see two sections of more anxious and less optimistic, and twoother sections of less anxious and optimistic.

The topic employment/job covered the hot words such as unemployment, income, rent, salary, mortgage, laid off, nojob/work etc, as shown in the table included in Fig.5(f). In March, we see the increase of optimistic and the decrease ofannoyed, however in April-May, we see less optimistic and the increase of annoyed. Peak of anxious was found on May8-10, when the reported April unemployment rate rose to a record 14.7% in US.

The topic working from home (WFH) is a warm topic, which obviously has more positive than all other topics. Thebars in Fig.5(g) show that optimistic took more than 40% in WFH, which is much higher than optimistic in other topics(between 10% and 25%) shown in Fig.5(a-f). We also see that optimistic keeps increasing with a slope of 0.0015, whileanxious is decreasing with a slope of −0.0008. Only on two days, optimistic dropped: April 18 when a UK police chiefwas criticised on a covid-19 mass gathering, and May 10 when PM calls on UK to go back to work.

Table 8: The statistics of daily sentiment fraction in different categories under different topics, presented as mean±std,and the number of tweets from which the statistics were obtained.

Opti. Than. Empa. (10−2) Joki. Pess. Anxi. Sad Anno. Deni.Stock 0.09 ± 0.03 0.01 ± 0.01 0.08 ± 0.09 0.11 ± 0.02 0.14 ± 0.03 0.07 ± 0.02 0.32 ± 0.05 0.07 ± 0.02 0.19 ± 0.03

10875 31970 138 26890 15147 22376 10325 48110 10333Oil 0.10 ± 0.03 0.01 ± 0.01 0.18 ± 0.17 0.25 ± 0.04 0.18 ± 0.05 0.1 ± 0.03 0.2 ± 0.05 0.06 ± 0.02 0.1 ± 0.02

6862 909 141 7325 20122 15330 7546 14755 4352Herd Imm. 0.18 ± 0.05 0.01 ± 0.01 0.19 ± 0.55 0.13 ± 0.03 0.24 ± 0.05 0.02 ± 0.01 0.23 ± 0.04 0.1 ± 0.04 0.09 ± 0.04

17698 1250 108 11605 13723 26275 2335 27006 13865Econ. Stim. 0.19 ± 0.05 0.05 ± 0.02 0.13 ± 0.25 0.07 ± 0.03 0.09 ± 0.03 0.05 ± 0.03 0.3 ± 0.05 0.07 ± 0.03 0.18 ± 0.04

31079 8717 169 33923 11270 14986 8586 55648 12483Drug 0.25 ± 0.03 0.07 ± 0.01 0.50 ± 0.15 0.07 ± 0.01 0.17 ± 0.02 0.03 ± 0.0 0.22 ± 0.03 0.07 ± 0.01 0.12 ± 0.02

537330 154398 10749 249193 143002 347756 70442 461216 150892Job 0.2 ± 0.05 0.06 ± 0.02 0.41 ± 0.18 0.09 ± 0.01 0.13 ± 0.01 0.15 ± 0.04 0.28 ± 0.04 0.02 ± 0.01 0.07 ± 0.02

209210 61106 4489 68115 84671 127257 156168 255922 20443WFH 0.43 ± 0.05 0.1 ± 0.02 0.28 ± 0.17 0.05 ± 0.01 0.12 ± 0.04 0.07 ± 0.01 0.1 ± 0.02 0.01 ± 0.01 0.11 ± 0.02

125176 31970 975 32971 14188 36042 21320 30036 3271

5 Conclusion

This paper contributed a Covid-19 sentimental analysis system with annotate datasets, called SenWave, which includes20K labeled English and Arabic tweets, and 21K labeled Chinese Weibo, as well as 106M+ Covid-19 tweets and Weibomessages collected since Mar 1, 2020 and January 20 respectively. We trained classifiers for 6 languages based on deeplearning language models to monitor the global sentiments under the Covid-19 pandemic sentimental. We analyzedthe sentiments varying of all languages and hot topics over days. On one hand, the emotions on these languages canbe directly reflected by corresponding events at the specific days through the varying of volumes and a significantincrease of emotions. On the other hand, the sentimental varying trends of 7 topics are also analyzed by showing thecorresponding emotions. This work helps to provide a rich resource to the community to study and combat COVID-19.

References

[1] Ansari Fatima Anees, Arsalaan Shaikh, Arbaz Shaikh, and Sufiyan Shaikh. Survey paper on sentiment analysis:Techniques and challenges. Technical report, EasyChair, 2020.

[2] Lei Zhang, Shuai Wang, and Bing Liu. Deep learning for sentiment analysis: A survey. Wiley InterdisciplinaryReviews: Data Mining and Knowledge Discovery, 8(4):e1253, 2018.

[3] Vishal Kharde, Prof Sonawane, et al. Sentiment analysis of twitter data: a survey of techniques. arXiv preprintarXiv:1601.06971, 2016.

[4] Walaa Medhat, Ahmed Hassan, and Hoda Korashy. Sentiment analysis algorithms and applications: A survey.Ain Shams engineering journal, 5(4):1093–1113, 2014.

[5] Saif M. Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. Semeval-2018 Task1: Affect in tweets. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2018), NewOrleans, LA, USA, 2018.

[6] Bennett Kleinberg, Isabelle van der Vegt, and Maximilian Mozes. Measuring emotions in the covid-19 real worldworry dataset. arXiv preprint arXiv:2004.04225, 2020.

12

Page 13: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

[7] Md Kabir, Sanjay Madria, et al. Coronavis: A real-time covid-19 tweets analyzer. arXiv preprint arXiv:2004.13932,2020.

[8] Jia Xue, Junxiang Chen, Chen Chen, ChengDa Zheng, and Tingshao Zhu. Machine learning on big data fromtwitter to understand public reactions to covid-19. arXiv preprint arXiv:2005.08817, 2020.

[9] Caleb Ziems, Bing He, Sandeep Soni, and Srijan Kumar. Racism is a virus: Anti-asian hate and counterhate insocial media during the covid-19 crisis. arXiv preprint arXiv:2005.12423, 2020.

[10] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet:Generalized autoregressive pretraining for language understanding. In Advances in neural information processingsystems, pages 5754–5764, 2019.

[11] Wissam Antoun, Fady Baly, and Hazem Hajj. Arabert: Transformer-based model for arabic language understand-ing. arXiv preprint arXiv:2003.00104, 2020.

[12] Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. Ernie: Enhanced languagerepresentation with informative entities. arXiv preprint arXiv:1905.07129, 2019.

[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectionaltransformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.

13

Page 14: EN : MONITORING THE GLOBAL SENTIMENTS …the pre-trained language model on the labeled data. Meanwhile, in order for a more complete analysis, Meanwhile, in order for a more complete

A PREPRINT - JUNE 22, 2020

Appendix

We present the hot words of the predicted English tweets for each category shown in Fig. 6 on March 9, 2020. Morerepresentations of different languages will be provided in future. The class optimistic is represented with hand washingand health, which means people should wash their hands frequently to keep health. The class thankful is presented withCovid-19 testing, while the class empathetic is shown with pray, hope, god and safe. The class pessimistic is reflectedwith economy market, oil market and large number of death. These hot words are also suitable for the class anxious.People felt sad about a lot of death and confirmed cases and the lockdown of school. The class annoyed is displayedwith dont and flu while the class denial is demonstrated with market and China since some people didn’t believe theCovid-19 report of China. Overall, these hot words in each category can represent the sentiments to some extend.

(a) Optimistic (b) Thankful (c) Empathetic (d) Pessimistic

(e) Anxious (f) Sad (g) Annoyed (h) Denial

(i) Official report (j) Joking

Figure 6: Hot words of each category for English tweets on March 9, 2020

14