fakeflow: fake news detection by modeling the flow of ... · real vs. fake analysis. in table4we...

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pages 679–689April 19 - 23, 2021. ©2021 Association for Computational Linguistics

679

FakeFlow: Fake News Detection by Modeling theFlow of Affective Information

Bilal Ghanem1, Simone Paolo Ponzetto2, Paolo Rosso1, Francisco Rangel31Universitat Politecnica de Valencia, Spain

2University of Mannheim, Germany3Symanto Research, Germany

[email protected], [email protected],

[email protected], [email protected]

Abstract

Fake news articles often stir the readers’ atten-tion by means of emotional appeals that arousetheir feelings. Unlike in short news texts, au-thors of longer articles can exploit such affec-tive factors to manipulate readers by adding ex-aggerations or fabricating events, in order toaffect the readers’ emotions. To capture this,we propose in this paper to model the flow ofaffective information in fake news articles us-ing a neural architecture. The proposed model,FakeFlow, learns this flow by combining topicand affective information extracted from text.We evaluate the model’s performance with sev-eral experiments on four real-world datasets.The results show that FakeFlow achieves su-perior results when compared against state-of-the-art methods, thus confirming the impor-tance of capturing the flow of the affective in-formation in news articles.

1 Introduction

In today’s information landscape, fake news areused to manipulate public opinion (Zhou and Za-farani, 2018) by reshaping readers’ opinions re-garding some issues. In order to achieve this goal,authors of fake news’ narratives need to capture theinterest of the reader. Thus, they are putting effortsto make their news articles look more objective andrealistic. This is usually done by adding misleadingterms or events that can have a negative or positiveimpact on the readers’ emotions.

Short text false information, e.g., fake claims ormisleading headlines, might be less harmful thannews articles. They may have some eye-catchingterms that aim to manipulate the readers’ emotions(Chakraborty et al., 2016). In many cases, the iden-tification of this kind of exaggeration in short state-ments can unmask the fabrication. On the otherhand, in fake news articles the authors exploit thelength of the news to conceal their fabricated story.

This fact exposes the readers to be emotionallymanipulated while reading longer texts that haveseveral imprecise or fabricated plots. The flowof information has been investigated for differenttasks: Reagan et al. (2016) studied the emotionalarcs in stories in order to understand complex emo-tional trajectories; Maharjan et al. (2018) modelthe flow of emotions over a book and quantify itsusefulness for predicting success in books; Karet al. (2018) explore the problem of creating tagsfor movies from plot synopses using emotions.

Unlike previous works (Rashkin et al., 2017; Shuet al., 2018; Castelo et al., 2019; Ghanem et al.,2020) that discarded the chronological order ofevents in news articles, in this work we propose amodel that takes into account the affective changesin texts to detect fake news. We hypothesize thatfake news has a different distribution of affectiveinformation across the text compared to real news,e.g. more fear emotion in the first part of the articleor more overall offensive terms, etc. Therefore,modeling the flow of such information may helpdiscriminating fake from real news. Our modelconsists of two main sub-modules, topic-based andaffective information detection. We combine thesetwo sub-modules since a news article’s topic mayhave a correlation with its affective information.For example, a fake news article about Islam orBlack people is likely to provoke fear and expressnegative sentiment while another fake news that isin favor of a particular politician might try to evokemore positive emotions and also express some ex-aggerations.The contributions of our work are as follows:

• We design a model that detects fake news arti-cles by taking into account the flow of affectiveinformation1.

1Available at https://github.com/bilalghanem/fake flow

https://github.com/bilalghanem/fake_flow

680

• Extensive experiments on four standard datasetsdemonstrate the effectiveness of our model overstate-of-the-art alternatives.

• We build a novel fake news dataset, called Multi-SourceFake, that is collected from a large set ofwebsites and annotated on the basis of the jointagreement of a set of news sources.

2 Related Work

Previous work on fake news detection is mainlydivided into two main lines, namely with a focuson social media (Zubiaga et al., 2015; Aker et al.,2017; Ghanem et al., 2019) or online news articles(Tausczik and Pennebaker, 2010; Horne and Adali,2017; Rashkin et al., 2017; Barron-Cedeno et al.,2019). In this work we focus on the latter one. Fact-checking (Karadzhov et al., 2017; Zlatkova et al.,2019; Shu et al., 2019a) is another closely relatedresearch topic. However, fact-checking targets onlyshort texts (that is, claims) and focuses on usingexternal resources (e.g. Web, knowledge sources)to verify the factuality of the news. The focus inprevious work on fake news detection is mainly onproposing new feature sets. Horne and Adali (2017)present a set of content-based features, includingreadability (number of unique words, SMOG read-ability measure, etc.), stylistic (frequency of part-of-speech tags, number of stop words, etc.) and psy-cholinguistic features (i.e., several categories fromthe LIWC dictionary (Tausczik and Pennebaker,2010)). When these features are fed into a SupportVector Machine (SVM) classifier and applied, forinstance, to the task of distinguishing satire fromreal news, they obtain high accuracies. Using thesame features for the task of fake news detection,however, results in somewhat lower scores. Perez-Rosas et al. (2018) propose a model (FakeNewsDe-tector) that uses a feature set consisting of unigramsand bigrams, psycholinguistic, readability, punctu-ation and dependency-based syntactic features, andthey evaluate the performance of their model in across-domain experiment. Rashkin et al. (2017)use a model based on ngram features with a Max-Entropy classifier and apply it to a dataset withdifferent types of fake news articles (e.g., satire,hoax, propaganda, etc.). Similar to the previouswork, the authors evaluate their system’s perfor-mance on in-domain and out-of-domain test sets,respectively. News, and in particular fake news, aredynamic in nature and change constantly. In orderto approach the dynamic nature of news, Castelo

et al. (2019) propose a topic-agnostic model (Top-icAgnostic) that is based on morphological (countof part-of-speech tags), psycholinguistic (personalconcerns, affection, and perception categories fromthe LIWC dictionary), readability (Gunning Fogmetric, etc.) and Web-Markup features to capturepatterns of the Web pages’ layout (frequency of ad-vertisements, presence of an author name, etc.). Allof the morphological, psycholinguistic and read-ability features in the TopicAgnostic model wereextracted from headlines and texts of the news ar-ticles. The approach obtains a better performancethan FakeNewsDetector on three different datasetsusing a SVM classifier. FakeNewsTracker (Shuet al., 2019b) is a deep neural network-based modelthat consists of two branches: one encodes newsarticle texts and the other encodes social media en-gagements (e.g., tweets and their replies). A similarmodel called Emotionally Infused Network (EIN)is proposed in Ghanem et al. (2020). EIN encodesthe text of the article and their affective content,based on several dictionaries, and then combinesthe two vector representations. The authors evalu-ate their model on a multi-class false informationdataset and show the effectiveness of using emo-tion features extracted from the text. Despite thelarge variety of features and models that have beenexplored in previous work, none of these worksconsiders the sequence of affective information intext; instead, they feed the entire news articles asone segment into their models. In contrast, the aimof our work is to evaluate this source of informa-tion, using a neural architecture.

3 The FakeFlow Model

Given an input document, the FakeFlow modelfirst divides it into N segments. Then it uses bothword embeddings and other affective features suchas emotions, hyperbolic words, etc. in a way tocatch the flow of emotions in the document. Themodel learns to pay attention to the flow of affectiveinformation throughout the document, in order todetect whether it is fake or real.

Figure 1 shows the architecture of the FakeFlowmodel. The neural architecture has two main mod-ules: The first module uses a Convolutional NeuralNetwork (CNN) to extract topic-based informationfrom articles (left branch). The second modulemodels the flow of the affective information withinthe articles via Bidirectional Gated Recurrent Units(Bi-GRUs) (right branch).

681

Figure 1: The architecture of the FakeFlow model.

3.1 Topic-based InformationGiven a segment n ∈ N of words, the model firstembeds words to vectors through an embedding ma-trix. Then it uses a CNN that applies convolutionprocesses and max pooling to get an abstractiverepresentation of the input segment. This repre-sentation highlights important words, in which thetopic information of the segment is summarized.Then it applies a fully connected layer on the out-put segments to get a smaller representation (vtopic)for later concatenation with the representation ofaffective information:

vtopic = f(Wa cnnv + ba)

where Wa and ba are the corresponding weight ma-trix and bias terms, and f is an activation functionsuch as ReLU, tanh, etc.

Key to FakeFlow is its ability to capture the rele-vance of the affective information with respect tothe topics. For this, we concatenate the topic sum-marized vector vtopic with the representation vectorvaffect , aimed at capturing the affective informationextracted from each segment (Section 3.2).

vconcat = vtopic ⊕ vaffect

To merge the different representations and capturetheir joint interaction in each segment, the modelprocesses the produced concatenated vector vconcatwith another fully connected layer:

vfc = f(Wc vconcat + bc)

In order to create an attention-focused representa-tion of the segments to highlight important onesand to provide the model with the ability to weightsegments differently according to the similarityof neighboring segments, the model applies acontext-aware self-attention mechanism (Zhenget al., 2018) on vfc. This is a crucial step, as theimportance of a segment at timestep t is relatedto the other segments since they share the samecontext in the news article. Moreover, applying theattention layer can help us understand which fea-tures are most relevant by showing to which wordsthe network attends to during learning. The outputof the attention layer is an attention matrix lt withscores for each token at each timestep.

3.2 Affective Flow of InformationTo model the affective information flow in the newsarticles, we choose the following lexical features,under the assumption that they have a differentdistribution across the articles’ segments. We usea term frequency representation weighted by thearticles’ length to extract the following featuresfrom each segment n:

• Emotions: We use emotions as features to detecttheir change among articles’ segments. For thatwe use the NRC emotions lexicon (Mohammadand Turney, 2010) that contains ∼14K wordslabeled using the eight Plutchik’s emotions (8Features).

• Sentiment: We extract the sentiment from thetext, positive and negative, again using the NRClexicon (Mohammad and Turney, 2010) (2 Fea-tures).

• Morality: We consider cue words from the MoralFoundations Dictionary2 (Graham et al., 2009)where words are assigned to one (or more) of thefollowing categories: care, harm, fairness, un-fairness (cheating), loyalty, betrayal, authority,subversion, sanctity and degradation (10 Fea-tures).

• Imageability: We use a list of words rated bytheir degree of abstractness and imageability3.These words have been extracted from the MRCpsycholinguistic database (Wilson, 1988) andthen using a supervised learning algorithm, the2https://moralfoundations.org/other-materials/3https://github.com/ytsvetko/metaphor/tree/master/

resources/imageability

https://moralfoundations.org/other-materials/

https://github.com/ytsvetko/metaphor/tree/master/resources/imageability

https://github.com/ytsvetko/metaphor/tree/master/resources/imageability

682

words have been annotated by the degrees ofabstractness and imageability. The list contains4,295 and 1,156 words rated by their degree ofabstractness and imageability, respectively (2Features).

• Hyperbolic: We use a list of ∼350 hyperbolicwords (Chakraborty et al., 2016), i.e., words withhigh positive or negative sentiment (e.g., terri-fying, breathtakingly, soul-stirring, etc.). Theauthors extracted these eye-catching words fromclickbaits news headlines (1 Feature).

To model the flow of the above features, we rep-resent each segment of an article by a vector vaffect

capturing all 23 features listed above. Then wefeed the document’s vectors to a Bi-GRU networkto summarize the contextual flow of the featuresfrom both directions4 to obtain vflow .

Given the segments’ flow representation (vflow )of an article and their relevance to the topics (lt),FakeFlow applies a dot product operation and thenaverages the output matrix across the segments toget a compact representation vcompact , which isthen fed into a fully connected layer:

vfinal = f(Wd vcompact + bd)

Finally, to generate the overall factuality label ofan article, a softmax layer is applied to the outputof the fully connected layer.

4 Fake News Datasets

Despite the recent efforts for debunking online fakenews, there is a dearth of publicly available datasets.Most of the available datasets are small in size (e.g.,the Politifact5 dataset in (Shu et al., 2018) has∼700available articles, the Celebrity dataset in (Perez-Rosas et al., 2018) has ∼500 articles, etc.), theirtest parts have not been manually annotated, orhave been collected from a very small number ofnews sources. Nonetheless, we evaluate FakeFlowon three different available datasets to demonstrateits performance. In addition, we create our owndataset. Table 1 gives an overview of the datasetsthat we used in our work.

MultiSourceFake: We rely on different resourcesfor creating the training and test portions of thedataset, so as to provide a challenging benchmark.

4During prototyping, GRU produced better overall resultsthan LSTM.

5https://www.politifact.com/

For the training part, we use OpenSources.co(OS), MediaBiasFactCheck.com (MBFC), and Poli-tiFact6 news websites’ lists. OS list contains 560domains, MBFC list has 548 domains, and the Poli-tiFact list has 227 domains. These lists have beenannotated by professional journalists. The lists con-tain domains of online news websites annotatedbased on the content type (as in the OS news list:satire, reliable, etc.; and in the PolitiFact news list:imposter, parody, fake news, etc.) or from a factu-ality perspective (as in the MBFC news list: low,medium, and high factuality). From the OS list,we select domains that are in one of the followingcategories: fake, bias, reliable, hate, satire, or con-spiracy. We consider domains under the reliablecategory as real news sources, and the rest as fake.The PolitiFact list is different from the OS list sinceit has only labels for domains that are either fakeor with mixed content. We discard the mixed ones7

and map the remaining ones to the fake news la-bel. Finally, we select from the MBFC list thosedomains that are annotated either as high or lowfactual news and we map them to real and fakelabels, respectively. Out of these three final lists,we select only those domains for our dataset thatare annotated in all lists in a consistent way; for ex-ample, we discard those domains that are annotatedas real in the OS list but their label in the MBFClist is fake (low factuality). The final list contains85 news websites. We now proceed by projectingthe domain-level ground truth onto the content ofthose domains and randomly sample articles, witha maximum of 100 news articles per domain.8

For the test part, we use the leadstories.com factchecking website for which professional journalistsannotated online news articles on the article level asfake or real. We do not follow the way we annotatethe training part since the projection of the domain-level ground truth inevitably introduces noise. Thejournalists that annotated leadstories.com assigneda set of labels to the fake news articles like, e.g.,false, no evidence, satire, misleading, etc.; we mapthem all to the fake label. In addition, we discardall articles that are multimedia-based. After col-lecting the news articles, we postprocess them bydiscarding very short articles (less than 30 words).The test part includes 689 fake news articles. Wecomplement the set with a sample of 1,000 real

6https://www.politifact.com/article/2017/apr/20/politifacts-guide-fake-news-websites-and-what-they/

7The discarded label is “Some fake stories”.8Some of the websites included less than 100 news articles.

https://www.politifact.com/

https://www.politifact.com/article/2017/apr/20/politifacts-guide-fake-news-websites-and-what-they/

https://www.politifact.com/article/2017/apr/20/politifacts-guide-fake-news-websites-and-what-they/

683

Figure 2: The distribution of the documents’ length inthe MultiSourceFake dataset.

Name Total Training TestMultiSourceFake 11,397 9,708 1,689

TruthShades 23,000 16,000 4,000 - 3,000PoliticalNews 14,240 11,392 2,848FakeNewsNet 20,208 16,156 4,039

Table 1: Number of articles in the datasets.

news articles from the training part. The overalldataset consists of 5,994 real and 5,403 fake newsarticles. The average document length (numberof words) in the MultiSourceFake dataset is 422words, and the 95th percentile value is 942. Figure2 shows the distribution of the documents’ lengthin the dataset.

TruthShades: This dataset has been proposed inRashkin et al. (2017). The dataset was crawledfrom a set of domains that are annotated by pro-fessional journalists as either propaganda, hoax,satire, or real. The dataset has been built from theEnglish Gigaword corpus for real news, and otherseven unreliable domains that annotated in one ofthe three previous false information labels.

PoliticalNews: Due to the fact that: “a classifiertrained using content from articles published at agiven time is likely to become ineffective in the fu-ture” (Castelo et al., 2019), the authors of this workcollected a dataset by crawling news websites inbetween the years 2013 to 2018 in order to evaluatetheir model’s performance on different years.

FakeNewsNet: is a fake news repository that con-sists of two comprehensive datasets, one collectedusing claims from PolitiFact and the other fromthe GossipCop fact checking website. Given thelarge number of true and false claims from thesetwo fact checking websites, Shu et al. (2018) builtnews datasets that contain visual and textual newsarticles content and social media information bysearching Twitter for users who shared news. Outof the whole collected information, we use only thetextual information of news articles, which is thepart we are interested in.

5 Experiments

Experimental setup. We split the articles’ textinto N segments and set the maximum length ofsegments to 800 words, applying zero padding tothe ones shorter than 800 words. Concerning theFakeFlow hyper-parameters, we tune various pa-rameters (dropout, the size of the dense layers, ac-tivation functions, CNN filter sizes and their num-bers, pooling size, size of the GRU layer, and the op-timization function) (see Appendix A for the searchspace) using early stopping on the validation set. Inaddition to these hyper-parameters, we also use thevalidation set to pick the best number of segments(N ). Regarding the MultiSourceFake dataset, weuse 20% of the training part for validation. We rep-resent words using pre-trained word2vec Google-News-300 embeddings9. For evaluation, we followthe setup from related work. We report accuracyand weighted precision, recall and F1 score, andmacro F1 for some datasets where the classes areimbalanced.

Baselines. To evaluate the performance of ourmodel, we use a combination of fake news detec-tion models and deep neural network architectures:

• CNN, LSTM: We use CNN and LSTM mod-els and validate their performance when treatingeach document as one fragment. We experimentwith different hyper-parameters and report re-sults for the ones that performed best on thevalidation set.

• HAN: The authors of (Yang et al., 2016) pro-posed a Hierarchical Attention Networks (HAN)model for long document classification. The pro-posed model consists of two levels of attentionmechanisms, i.e., word and sentence attention.The model splits each document into sentencesand learns sentence representations from words.

• BERT: is a text representation model thatshowed superior performance on multiple natu-ral language processing (NLP) benchmarks (De-vlin et al., 2019). We use the pre-trained bert-base-uncased version which has 12-layers andyields output embeddings with a dimension ofsize 768. We feed the hidden representation ofthe special [CLS] token, that BERT uses to sum-marize the full input sentence, to a softmax layer.Experimentally, we found that fine-tuning BERT9https://code.google.com/archive/p/word2vec/

https://code.google.com/archive/p/word2vec/

684

layers gives a higher performance. It is worthmentioning that BERT input length is limited to512 word pieces (sub-words level) (Devlin et al.,2019), thus, we discard the rest of the text inlong news articles.

• Fake News Detection Models: We compareour model to several fake news detection mod-els. We use Horne and Adali (2017) model,FakeNewsDetector (Perez-Rosas et al., 2018),Rashkin et al. (2017) model, and EIN (Ghanemet al., 2020).10

• Longformer: Giving that Transformer-basedmodels (i.e. BERT) are unable to process longsequences, we use Longformer (Beltagy et al.,2020), which is a SOTA model for long docu-ment tasks. In our experiments, we set the maxsequence length to 1500 to handle documentsthat have more than 512 tokens in the Multi-SourceFake dataset (see Figure 2). Also, wefound that fine-tuning the Longformer modelgives better results and a much faster conver-gence.

6 Results and Analysis

Table 2 presents the results of our proposed modeland the baselines on the MultiSourceFake dataset.Our best result was achieved by using 10 as thenumber of segments (N , as found on the valida-tion data). In Figure 3 we show the model’s per-formance for segments of different length.11 Ingeneral, the results show that models that are basedon either word ngrams or word embeddings areperforming better than other models that use hand-crafted features, e.g. Horne and Adali (2017). Also,despite the huge amount of data used to train theBERT model, the results show that BERT performsworse than FakeFlow and also fails to outperformsome of the other models. We speculate that thisis due to the fact that the input length in BERT islimited to 512 words, as we mentioned previously,and a large portion of the news articles in the Mul-tiSourceFake dataset has a length greater than 512words. The results of the Longformer model con-firm our claim regarding the documents’ length andshow a significantly higher F1 score than the BERT

10We only compare TopicAgnostic on the dataset the au-thors proposed (PoliticalNews).

11In the case of N=1 in Figure 3, we set the maximumsegment length to 1500 words instead of 800 to not lose partsof the longer articles.

Figure 3: The accuracy and F1 results of the FakeFlowmodel using different N (number of segments).

Model Acc. Prec. Rec. F1macro

Majority Class 0.59 0.35 0.59 0.37Horne and Adali (2017) 0.80 0.75 0.78 0.80FakeNewsDetector 0.86 0.86 0.86 0.86

LSTM 0.91 0.86 0.91 0.90CNN 0.91 0.89 0.89 0.91

Rashkin et al. (2017) 0.92 0.92 0.92 0.92BERT 0.93 0.93 0.94 0.93‡EIN 0.93 0.94 0.93 0.93‡HAN 0.94 0.94 0.94 0.93‡

Longformer 0.97 0.97 0.97 0.97†FakeFlow 0.96 0.93 0.97 0.96

FakeFlow – Topic only 0.91 0.89 0.90 0.90FakeFlow – Affective only 0.61 0.38 0.60 0.40

Table 2: Results on the MultiSourceFake dataset. (‡) in-dicates a statistically significant improvement of Fake-Flow over the referred model using McNemar test; (†)indicates no statistically significant improvement overFakeFlow.

model. This emphasizes that despite the strong per-formance of BERT on multiple NLP benchmarks, itis unable to handle long text documents, in contrast,e.g., to vanilla text categorization (Adhikari et al.,2019). In addition, Longformer’s results show ahigher F1 score than the FakeFlow model, yet, thedifference is statically insignificant.

To isolate the contribution of topical vs. affectiveinformation we run two simplified versions of ourarchitecture, each consisting of the networks tocapture topical and affective information only. Theresults show that the flow of the affect informationhas a weak performance when used alone; thisemphasizes that affective information of a newsarticle is a meaningful, yet complementary sourceof information.

Performance on Multiple Datasets. In Table3 we compare the performance of the FakeFlowmodel to SOTA results on the other datasets we in-troduced in Section 4. The TruthShades dataset hastwo test sets, in-domain and out-of-domain. In thein-domain configuration, training and test articlescome from the same sources, and from differentsources in out-of-domain configuration. The re-sults demonstrate that FakeFlow achieves a better

685

TruthShades Acc. Prec. Rec. F1macro

Out-of-domainRashkin et al. (2017) 0.67 0.70 0.67 0.65

FakeFlow 0.68 0.69 0.68 0.68In-domain

Rashkin et al. (2017) 0.91 0.91 0.91 0.91FakeFlow 0.96 0.96 0.96 0.96

PoliticalNews Acc. Prec. Rec. F1weighted

TopicAgnostic 0.87 0.87 0.87 0.87FakeFlow 0.88 0.88 0.88 0.88

FakeNewsNet Acc. Prec. Rec. F1weighted

FakeNewsTracker 0.80 0.82 0.75 0.79One-Hot LR 0.82 0.90 0.72 0.80

FakeFlow 0.86 0.86 0.86 0.85

Table 3: Results on multiple datasets. We compare theFakeFlow model to SOTA models on each dataset.

F1 on both test sets. In a similar way, the results onthe PoliticalNews dataset show that FakeFlow alsooutperforms the TopicAgnostic model, althoughthe gap in results is not very large. Finally, regard-ing the FakeNewsNet dataset, it looks that the deeplearning-based model (FakeNewsTracker) does notachieve a good performance comparing to the otherproposed baseline by the authors, which is a Logis-tic Regression (LR) classifier with one-hot vectorsof the news articles’ text. Furthermore, it seemsthat a simple word-based model works better thana more sophisticated model that incorporates so-cial media and context information. The FakeFlowmodel, on the other hand, achieves a better result,outperforming both the FakeNewsTracker and theLR baseline.

Topic-Aware Model. Constantly, new events arecovered by news agencies. These events are dif-ferent from the old ones in terms of discourse andtopic. Therefore, a fake news detector trained onnews articles from years back is unable to detectrecent news. In this experiment, we are evaluatingour approach on the PoliticalNews dataset that isconstructed from news distributed across differentyears (2013 to 2018). Following the experimentalsetup in (Castelo et al., 2019), we train the Fake-Flow model on news from one year and test onthe other years, one year at a time for testing. Forexample, we train the model on news from 2013and we test on news from 2015. Note that each testset is associated with 5 results, one for each year.Figure 4 shows the average accuracy for each testset. We compare FakeFlow to the TopicAgnosticmodel that proved to be effective at detecting fake

Figure 4: Topic aware experiment’s results.

news from different years. It is worth mentioningthat the features of the TopicAgnostic model havebeen extracted from both headlines and text of thenews articles. However, the results show that bothmodels have a similar performance, except for the2013 test set where FakeFlow achieves a higheraccuracy with a difference of 7%. The experimentshows that FakeFlow is capable of detecting fakenews from different years, with a flat performanceacross the years.

Attention Weights. The proposed FakeFlowmodel shows that taking into account the flow ofaffective information in fake news is an importantperspective for fake news detection. We argue thatbeing able to better understand the behaviour ofthe model can make it more transparent to the end-users. Figure 5 illustrates this by showing the at-tention weights of a fake news article across the 10segments (left bar).12 The figure shows that Fake-Flow attends more to the beginning of the article.For better understanding, we match the affectiveinformation with the attention weights. Regardingthe news text in the figure, the emotions features13

show a clear example of how fake news articlestry to manipulate the reader. It looks as if the ex-istence of fear, sadness, and surprise emotions atthe beginning of the article have triggered the at-tention on this part. Towards the end of the article,on the other hand, we can notice that such neg-ative emotions do not exist, while emotions likejoy and anticipation appear. This exemplifies howfake news try to attract the readers’ attention inthe first part of the text. Regarding the moralityfeatures, we only match the word “kill” with theharm category. Also, for the hyperbolic feature,we match the words “terrifying” and “powerful”.In the same manner, both morality and hyperbolicfeatures match words that occur at the beginningof the article. Lastly, for both sentiment and im-

12We averaged the attention weight matrix along thetimesteps (number of segments) representations.

13Words with multiple colors mean that they have beenannotated with multiple emotion types in the NRC lexicon.

686

ageability features, we are not able to find a clearinterpretation in this example where many wordsacross the segments match.

Real vs. Fake Analysis. In Table 4 we present ananalysis on both real and fake news articles. Theanalysis gives an intuition to the reader on the dis-tribution of the used features across the articles’segments. It shows that an emotion like fear has onaverage a higher difference between the first andthe last segment in fake news than in real ones (seeFigure 6 for a visualized distribution). Also, a fea-ture like hyperbolic has a higher average value andlower standard deviation across all segments forfake news than real news, thus indicating that fakenews have a higher amount of hyperbolic wordswith similarly high values.

Figure 6: The flow of the Fear emotion in fake (�) andreal (•) news articles in the MultiSourceFake dataset.Y-axis presents the average number of Fear emotionwords in 0-1 scale; the X-axis presents the documenttext, divided into 10 segments.

7 Conclusion

In this paper we presented FakeFlow, a model thattakes into account the flow of affective informa-tion (emotions, sentiment, hyperbolic words, etc.)in texts to better detect fake news articles. Themodel receives as input a text, segmented intosmaller units, instead of processing one long frag-ment. This enables it to learn the flow of affectiveinformation by modeling the interaction betweenthe topic and affective terms in the news article.We evaluated our model on four different datasetsand compared it to several strong baselines. Theextensive experiments show the effectiveness ofFakeFlow over state-of-the-art models. AlthoughFakeFlow was trained using a limited amount oftext, the results demonstrated that it achieves resultson-par with resource-hungry models (e.g. BERTand Longformer). In future work, we plan to ex-tend our dataset and study more fine-grained newstypes, e.g. propaganda, from an emotional per-spective. Moreover, we plan to investigate howwe can replace the lexicon-based information with

language-independent approaches in an attempt tomake our model multilingual.

Acknowledgment

The first author would like to thank Ines Rehbeinand Ana Uban for their valuable comments andsuggestions. The work of the third author was par-tially funded by the Spanish MICINN under theresearch project MISMIS-FAKEnHATE on MIS-information and MIScommunication in social me-dia: FAKE news and HATE speech (PGC2018-096212-B-C31) and by the Generalitat Valencianaunder the research project DeepPattern (PROME-TEO/2019/121).

ReferencesAshutosh Adhikari, Achyudh Ram, Raphael Tang, and

Jimmy Lin. 2019. Docbert: Bert for document clas-sification. arXiv preprint arXiv:1904.08398.

Ahmet Aker, Leon Derczynski, and Kalina Bontcheva.2017. Simple open stance classification for rumouranalysis. In Proceedings of the International Con-ference Recent Advances in Natural Language Pro-cessing, RANLP 2017, pages 31–39.

Alberto Barron-Cedeno, Giovanni Da San Martino, Is-raa Jaradat, and Preslav Nakov. 2019. Proppy: Asystem to unmask propaganda in online news. InProceedings of the AAAI Conference on Artificial In-telligence, volume 33, pages 9847–9848.

Iz Beltagy, Matthew E Peters, and Arman Cohan.2020. Longformer: The Long-document Trans-former. arXiv preprint arXiv:2004.05150.

Sonia Castelo, Thais Almeida, Anas Elghafari, AecioSantos, Kien Pham, Eduardo Nakamura, and JulianaFreire. 2019. A Topic-Agnostic Approach for Iden-tifying Fake News Pages. In Companion Proceed-ings of The 2019 World Wide Web Conference, pages975–980.

Abhijnan Chakraborty, Bhargavi Paranjape, SouryaKakarla, and Niloy Ganguly. 2016. Stop Clickbait:Detecting and Preventing Clickbaits in Online NewsMedia. In Advances in Social Networks Analysisand Mining (ASONAM), 2016 IEEE/ACM Interna-tional Conference on, pages 9–16.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. BERT: Pre-training ofDeep Bidirectional Transformers for Language Un-derstanding. In Proceedings of the 2019 Conferenceof the North American Chapter of the Associationfor Computational Linguistics: Human LanguageTechnologies, Volume 1 (Long and Short Papers),pages 4171–4186.

687

Figure 5: Emotional interpretation of a fake news article by showing the attention weights (the bar on the left) andhighlighting the emotions in the text.

Features Real News Fake News

µ firstseg. µ lastseg. µ allseg. σ allseg. µ firstseg. µ lastseg. µ allseg. σ allseg.

Em

otio

ns

Anger 0.175 0.167 0.170 0.003 0.183 0.170 0.171 0.008Anticipation 0.301 0.315 0.264 0.025 0.293 0.305 0.260 0.022

Disgust 0.095 0.101 0.095 0.004 0.096 0.091 0.091 0.007Fear 0.254 0.250 0.238 0.010 0.265 0.226 0.238 0.011Joy 0.217 0.226 0.183 0.021 0.207 0.203 0.175 0.020

Sadness 0.161 0.158 0.160 0.006 0.155 0.155 0.158 0.007Surprise 0.140 0.144 0.123 0.012 0.142 0.123 0.120 0.008

Trust 0.446 0.466 0.400 0.031 0.461 0.421 0.401 0.029

Sent

i. Positive 0.599 0.623 0.558 0.030 0.608 0.591 0.554 0.032Negative 0.369 0.337 0.347 0.011 0.367 0.336 0.350 0.013

Mor

ality

Harm 0.007 0.011 0.007 0.002 0.008 0.013 0.007 0.002Care 0.026 0.023 0.019 0.004 0.021 0.022 0.019 0.003

Fairness 0.003 0.013 0.007 0.002 0.005 0.020 0.009 0.004Unfairness 0.000 0.000 0.001 0.000 0.001 0.000 0.001 0.001

Loyalty 0.016 0.017 0.019 0.002 0.014 0.016 0.019 0.003Betrayal 0.004 0.003 0.005 0.001 0.002 0.003 0.004 0.001

Authority 0.025 0.032 0.026 0.003 0.024 0.028 0.026 0.002Subversion 0.005 0.004 0.004 0.001 0.006 0.007 0.005 0.002

Sanctity 0.005 0.005 0.004 0.001 0.005 0.006 0.005 0.002Degradation 0.003 0.004 0.003 0.001 0.006 0.004 0.003 0.001

Img Imageability 0.845 1.203 1.144 0.122 0.877 1.184 1.145 0.124

Abstraction 0.424 0.331 0.352 0.028 0.382 0.304 0.342 0.037Hyperbolic 0.042 0.05 0.045 0.005 0.046 0.044 0.047 0.003

Table 4: A quantitative analysis of the features existence across articles’ segments. We present the average valuein the first segment (µ firstseg.), the average value in the last segment (µ lastseg.), the average value in the all 10segments (µ allseg.), and the standard deviation (σ allseg.) of a feature across the 10 segments, both in real andfake news.

688

Bilal Ghanem, Alessandra Teresa Cignarella, CristinaBosco, Paolo Rosso, and Francisco Manuel RangelPardo. 2019. UPV-28-UNITO at SemEval-2019Task 7: Exploiting Post’s Nesting and Syntax Infor-mation for Rumor Stance Classification. In Proceed-ings of the 13th International Workshop on SemanticEvaluation, pages 1125–1131.

Bilal Ghanem, Paolo Rosso, and Francisco Rangel.2020. An emotional analysis of false information insocial media and news articles. ACM Transactionson Internet Technology (TOIT), 20(2):1–18.

Jesse Graham, Jonathan Haidt, and Brian A Nosek.2009. Liberals and Conservatives Rely on DifferentSets of Moral Foundations. Journal of personalityand social psychology, 96(5):1029–1046.

Benjamin D Horne and Sibel Adali. 2017. This JustIn: Fake News Packs a Lot in Title, Uses Simpler,Repetitive Content in Text Body, More Similar toSatire than Real News. In Eleventh InternationalAAAI Conference on Web and Social Media, pages759–766.

Sudipta Kar, Suraj Maharjan, and Thamar Solorio.2018. Folksonomication: Predicting tags for moviesfrom plot synopses using emotion flow encodedneural network. In Proceedings of the 27th Inter-national Conference on Computational Linguistics,pages 2879–2891.

Georgi Karadzhov, Preslav Nakov, Lluıs Marquez,Alberto Barron-Cedeno, and Ivan Koychev. 2017.Fully automated fact checking using externalsources. In Proceedings of the International Con-ference Recent Advances in Natural Language Pro-cessing, RANLP 2017, pages 344–353.

Suraj Maharjan, Sudipta Kar, Manuel Montes-yGomez, Fabio A Gonzalez, and Thamar Solorio.2018. Letting emotions flow: Success predictionby modeling the flow of emotions in books. arXivpreprint arXiv:1805.09746.

Saif M Mohammad and Peter D Turney. 2010. Emo-tions Evoked by Common Words and Phrases: Us-ing Mechanical Turk to Create an Emotion Lexicon.In Proceedings of the NAACL HLT 2010 workshopon computational approaches to analysis and gener-ation of emotion in text, pages 26–34. Associationfor Computational Linguistics.

Veronica Perez-Rosas, Bennett Kleinberg, AlexandraLefevre, and Rada Mihalcea. 2018. Automatic De-tection of Fake News. In Proceedings of the 27thInternational Conference on Computational Linguis-tics, pages 3391–3401, Santa Fe, New Mexico, USA.Association for Computational Linguistics.

Hannah Rashkin, Eunsol Choi, Jin Yea Jang, SvitlanaVolkova, and Yejin Choi. 2017. Truth of VaryingShades: Analyzing Language in Fake News and Po-litical Fact-Checking. In Proceedings of the 2017Conference on Empirical Methods in Natural Lan-guage Processing, pages 2931–2937.

Andrew J Reagan, Lewis Mitchell, Dilan Kiley,Christopher M Danforth, and Peter Sheridan Dodds.2016. The emotional arcs of stories are dominatedby six basic shapes. EPJ Data Science, 5(1):1–12.

Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee,and Huan Liu. 2019a. defend: Explainable fakenews detection. In Proceedings of the 25th ACMSIGKDD International Conference on KnowledgeDiscovery & Data Mining, pages 395–405.

Kai Shu, Deepak Mahudeswaran, and Huan Liu. 2019b.FakeNewsTracker: a Tool for Fake News Collection,Detection, and Visualization. Computational andMathematical Organization Theory, 25(1):60–71.

Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dong-won Lee, and Huan Liu. 2018. FakeNewsNet: AData Repository with News Content, Social Contextand Dynamic Information for Studying Fake Newson Social Media. arXiv preprint arXiv:1809.01286.

Yla R Tausczik and James W Pennebaker. 2010. ThePsychological Meaning of Words: LIWC and Com-puterized Text Analysis Methods. Journal of lan-guage and social psychology, 29(1):24–54.

Michael Wilson. 1988. MRC PsycholinguisticDatabase: Machine-usable dictionary, version 2.00.Behavior research methods, instruments, & comput-ers, 20(1):6–10.

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He,Alex Smola, and Eduard Hovy. 2016. Hierarchi-cal Attention Networks for Document Classification.In Proceedings of the 2016 conference of the NorthAmerican chapter of the association for computa-tional linguistics: human language technologies,pages 1480–1489.

Guineng Zheng, Subhabrata Mukherjee, Xin LunaDong, and Feifei Li. 2018. Opentag: Open AttributeValue Extraction from Product Profiles. In Proceed-ings of the 24th ACM SIGKDD International Con-ference on Knowledge Discovery & Data Mining,pages 1049–1058.

Xinyi Zhou and Reza Zafarani. 2018. Fake news: Asurvey of research, detection methods, and opportu-nities. arXiv preprint arXiv:1812.00315.

Dimitrina Zlatkova, Preslav Nakov, and Ivan Koychev.2019. Fact-checking meets fauxtography: Verify-ing claims about images. In Proceedings of the2019 Conference on Empirical Methods in Natu-ral Language Processing and the 9th InternationalJoint Conference on Natural Language Processing(EMNLP-IJCNLP), pages 2099–2108.

Arkaitz Zubiaga, Maria Liakata, Rob Procter, KalinaBontcheva, and Peter Tolmie. 2015. Towards Detect-ing Rumours in Social Media. In AAAI Workshop:AI for Cities, pages 35–41.

https://www.aclweb.org/anthology/C18-1287

https://www.aclweb.org/anthology/C18-1287

689

A Appendices

A.1 Hyper-parametersFor FakeFlow hyper-parameters, we tune the fol-lowing parameters with the their correspondentsearch space:

• Dropout: random selection in the range [0.1,0.6],

• Dense layers: [8, 16, 32, 64, 128],

• Activation functions: [selu, relu, tanh, elu],

• CNN filters’ sizes: [(2, 3, 4), (3, 4, 5), (4, 5,6), (3, 5), (2, 4), (4,), (5,), (3, 5, 7), (3, 6)],

• Numbers of CNN filters: [4, 8, 16, 32, 64,128],

• Pooling size: [2, 3],

• GRU units: [8, 16, 32, 64, 128],

• Optimization function: [adam, adadelta, rm-sprop, sgd],

• For the early stopping, we set the ‘patience‘parameter to 4 and we set the epochs numberto 50.

For the parameters selection, we use hyperopt14

library that receives the above search space to ran-domly select different N combination of parame-ters (trials). We use a small value of N in all of ourexperiments to avoid overdrawn finetuning; we setN to 35.

A.2 Topic Aware experimentsIn Figure 4, we present the average accuracy of ourmodel when we train on different years and test aspecific one. In the following we show the resultsbefore we averaged them.

TrainTest 2013 2014 2015 2016 2017 2018

2013 0.00 0.82 0.74 0.76 0.78 0.742014 0.84 0.00 0.79 0.76 0.81 0.742015 0.79 0.81 0.00 0.82 0.80 0.822016 0.80 0.76 0.87 0.00 0.85 0.792017 0.79 0.82 0.76 0.80 0.00 0.852018 0.79 0.75 0.81 0.83 0.83 0.00

Average 0.80 0.79 0.79 0.79 0.81 0.79

Table 5: FakeFlow results for each train-test run for theTopic-Aware experiment.

14https://github.com/hyperopt/hyperopt

fakeflow: fake news detection by modeling the flow of ... · real vs. fake analysis. in table4we...

Documents