insights, analysis and prediction of veriﬁed users on...

Insights, Analysis and Prediction of Verified Users on Twitter

Thesis submitted in partial fulfillmentof the requirements for the degree of

MS by Researchin

Computer Science and Engineering

by

Indraneil Paul201302170

[email protected]

Information Retrieval and Extraction Lab, Language Technologies Research InstituteInternational Institute of Information Technology

Hyderabad - 500 032, INDIAJune 2019

Copyright c© Indraneil Paul, 2019All Rights Reserved

International Institute of Information TechnologyHyderabad, India

CERTIFICATE

It is certified that the work contained in this thesis, titled “Insights, Analysis and Prediction of VerifiedUsers on Twitter” by Indraneil Paul, has been carried out under my supervision and is not submittedelsewhere for a degree.

Date Prof. Ponnurangam Kumaraguru

To Family and Near Ones

Acknowledgments

I want to extend my heartiest gratitude to my thesis advisor Dr Ponnurangam Kumaraguru, for takingme under his mentorship at a juncture when I was disillusioned with the ivory towers of academia. Hedid so at forthwith and has subsequently provided me with constant support and encouragement toexplore and chart my own research path. The freedom he has afforded me has been liberating, andthe guidance I have acquired from him has been the most cherished part of my research journey. Hisinteractions with his research students are a lesson in management on how to resist the temptation tomicromanage without sacrificing productivity. Additionally, I would also thank Dr Manish Gupta foragreeing to collaborate with us and co-guide us. I will forever reminisce about the involved discussionson statistics and inference approaches that we had en route to completing the project. I cannot thankhim and Microsoft India enough for lending us access to their commercial social media data, whichmade such a large scale research undertaking possible. Both my advisors have always been the primarysource of inspiration to learn ethics in research.

I would also thank Dr Sujit Gujar for stimulating my interest in game theory and mechanism design.Our lengthy discussions on two-sided market matching and dynamic markets are some of the highlightsof my final year in college. My subsequent research project, under his guidance, has been a source ofnever-ending intellectual fodder.

The six years that I have spent at this institution have been some of my most defining. These yearshave forged me into a tougher self through relentless academic pressure, adversity and intellectual trials.Standing by me through it all have been my friends who have taught me to live in the moment andcherish the individual instances of camaraderie. I would thank Nair for being a sounding board for allmy idiosyncratic plans and nonsensical schemes as well as for being of invaluable help when needed inmy research. I would also like to thank Arpan and Anuj for being diligent roommates, Parth for beinga walking reference for all things CS, Ghosh for being an intellectual foil and Abhijeet for his gyaanduring placements. I thank Romil for being his pedantic self, Saksham for his unforgettable get-richschemes, Shubham for being my quondam CVIT bro, Aman for personifying the IIIT maxim of lite andDhruva, Shukla and Moneish for the memorable football talks we had.

My journey wouldnt have been as successful as it is without my family. My parents have been verysupportive of all my endeavours and have let me learn lifes lessons the hard way when needed. Myfather has been steadfast even in the face of my initial forays into the world of research falling apartand exemplifies the never-give-up attitude to life that I hope to imbibe while my mother has been an

v

vi

incessant source of hope and confidence and my greatest fan. I would also like to thank my uncle,cousin and grandmother for checking up on me periodically when I was down and out.

I want to finish by saying that any well-wisher who may have gone unmentioned, can rest easyknowing that your help and goodwill has gone into making me who I am today.

Abstract

We live in an increasingly connected world, where directed links in social connectivity graphs couldrepresent anything from longstanding friendships or aligning niche interests to a publisher-subscriberrelationship quenching an insatiable appetite for real-time news or celebrity gossip. A pertinent wayof describing the relation between social media and its users can be summarized by rescripting culturalanthropologist Michael Wesch’s quote:

“Social Media is Us/ing Us”

This alludes to the phenomena where social mores and personal tastes shape the landscape of ouronline platforms such as popular hashtags and trending topics while these platforms simultaneously alsoshape the attitudes and outlook of the people who use them. Social media platforms are just as amenableto be used for proselytizing impressionable users as they are to getting long ignored or suppressedviewpoints out in the open.

In the face of a runaway fake news problem and faced with a world where a superior online presencecould decide elections, policy matters and foment prejudices, it is unfortunate that as a result of a myriadof factors including poor platform design and vacillating stances on the significance of platform badgesby social networks, the social interaction aspects of authenticity and prominence become inextricablylinked with one another and frequently conflated, the consequences of which can be drastic. It has beenshown that the presence of authenticity markers next to user-generated content enhances its reach andcredibility, while the posts from highly perceived sources have been demonstrated to likely be identifiedas more plausible.

Social network and publishing platforms, such as Twitter, support the concept of verification. Ver-ified accounts are deemed worthy of platform-wide public interest and are separately authenticated bythe platform itself. There have been repeated assertions by platforms a la Twitter and Facebook, aboutverification not being tantamount to endorsement. However, a significant body of prior work suggeststhat possessing a verified status symbolizes enhanced credibility in the eyes of the platform audience.As a result, such a station is highly coveted among public figures and influencers. Hence, we attemptto characterize the network of verified users on Twitter and compare the results to similar analyses per-formed for the entire Twitter network. We extracted the whole graph of verified users on Twitter (as ofJuly 2018) and obtained 231,235 English user-profiles and 79,213,811 connections. Subsequently, inthe network analysis, we found that the sub-graph of verified users mirrors the full Twitter users graphin some aspects, such as possessing a short diameter. However, our findings contrast with earlier results

vii

viii

on multiple fronts, such as the possession of a power-law out-degree distribution, slight dissortativity,and a significantly higher reciprocity rate, as elucidated in the thesis. Moreover, we attempt to gaugethe presence of salient components within this sub-graph and detect the absence of homophily with re-spect to popularity, which again is in stark contrast to the full Twitter graph. Finally, we demonstratestationarity in the time series of verified user activity levels.

It is in this backdrop that we attempt to deconstruct the extent to which Twitter’s verification policymingles the notions of authenticity and authority. To this end, we seek to unravel the aspects of a user’sprofile, which likely engender or preclude verification. The aim of the thesis is two-fold: First, we testif discerning the verification status of a handle from profile metadata and content features is feasible.Second, we unravel the characteristics which have the most significant bearing on a handle’s verificationstatus. We augmented our dataset with all the 494 million tweets of the aforementioned users over a oneyear collection period along with their temporal social reach and activity characteristics. Our proposedmodels are able to reliably identify verification status (Area under curve AUC > 99%). We show thatthe number of public list memberships, presence of neutral sentiment in tweets and an authoritativelanguage style are the most pertinent predictors of verification status.

To the best of our knowledge, this work represents the first quantitative attempt at characterizingverified users on Twitter and also the first attempt at discerning and classifying verification worthy userson Twitter.

Notation

This section provides a concise reference describing some of the commonly used terms in this doc-ument. The section serves as a useful glossary to uninitiated readers.

Twitter Platform

Verification Twitter, Facebook, Instagram and other social media platforms have in-corporated a verification process to authenticate account handles theydeem important enough to be worth impersonating. This is usuallyonly conferred to accounts of well-known public personalities andbusinesses.

Bio A small public summary about oneself or one’s business displayed un-der their Twitter profile picture.

Friends Accounts followed by a user. This subscribes an account to the Tweetsof the account followed.

Followers Accounts following a user.

Statuses/Tweets Messages on the platform, visible to an account’s followers.

Re-tweets Tweets composed by other accounts, which are broadcast by an accountto its own followers.

Mentions A mention is when someone uses the @ sign immediately followed byanother Twitter account thus referencing them in their Tweet.

Hashtag A word or phrase preceded by a # sign, used on social media websitesand applications, especially Twitter, to identify messages on a specifictopic.

Public List A curated group of Twitter accounts compiled by a user. Usually so-cially or professionally related accounts are grouped together in a singlelist.

Bot Accounts that communicate more or less autonomously on social me-dia, often with the task of influencing the course of discussion or theopinions of its readers.

ix

x

API An application programming interface (API) allows other web servicesand applications to integrate with Twitter. It provides endpoints tosystematically query user, network and trending information from theplatform.

Firehose Commercial pipeline guaranteeing delivery of 100% of the tweets thatmatch search criteria, provisioned for enterprise workloads. The fulldelivery guarantee differentiates this from the Twitter Streaming APIsolution, which only guarantee a partial randomly sampled delivery.

Network

Homophily Degree of the tendency of similar nodes in a network to be linked withone another.

Reciprocity Proportion of network links which are bi-directional.

Power-Law A power law is a special kind of mathematical relationship between twoquantities. When the frequency of an event varies as a power of someattribute of that event, the frequency is said to follow a power law.

Betweenness Centrality A measure of the influence of a node over the flow of information be-tween every pair of nodes under the assumption that information pri-marily flows over the shortest paths between them.

PageRank Centrality PageRank centrality works by counting the number and quality of linksto a node to determine a rough estimate of how important it is. Theunderlying assumption is that more important nodes are likely to receivemore links from other important nodes.

Diameter Longest of all the calculated shortest paths in a network.

Effective Diameter The 90th percentile longest of the shortest path lengths.

Time Series

Autocorrelation Correlation of a signal with a delayed copy of itself as a function ofdelay.

Changepoint A changepoint is a point in a sequence of observations after which theinherent process that generates data has changed.

xi

Evaluation

ROC AUC Performance measurement for binary classification problems at variousclassification thresholds settings using the area under the curve betweenthe true positive rate and false positive rate.

F1 Score Performance measurement for binary classification using a harmonicmean of precision and recall.

Miscellaneous

Captology The study of computers as persuasive technologies.

Heuristic Model Widely recognized communication model by Shelly Chaiken that at-tempts to explain how people receive and process persuasive messagesusing simplifying decision rules to assess the message content quickly.

LIWC Analytic A summary statistic capturing the degree to which people use wordsthat suggest formal, logical, and hierarchical thinking patterns.

LIWC Authentic A summary statistic of genuineness, humility and vulnerability in ex-pressed speech. Derived from a series of studies where people wereinduced to be honest or deceptive as well as a summary of deceptionstudies published subsequently.

LIWC Clout A summary statistic representing relative social status, confidence, orleadership that people display through their writing or talking.

LIWC Tone A summary statistic of both positive and negative psychometricemotions.

Contents

Chapter Page

Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 A Brief and Controversial History of Twitter Verification . . . . . . . . . . . . . . . . 31.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Thesis Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Background and Methods Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Tests for Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Tests for Auto-correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Changepoint Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Inferring Heavy-Tailed Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.1 Power Law MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Automation Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Node Importance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.1 Betweenness Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.2 PageRank Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Fixing class imbalance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.1 ADASYN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.2 SMOTETomek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.7 Linguistic Inquiry and Summary Scores . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 User Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Content Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 Temporal Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

xii

CONTENTS xiii

3.5 Miscellaneous Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.6 Rectifying Class Imbalance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Characterizing Verified Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.1 Basic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3.2 Degree and Eigenvalue Distribution . . . . . . . . . . . . . . . . . . . . . . . 274.3.3 Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.4 Degrees of Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.5 Verified User Bios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3.6 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Activity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Discerning Verified Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Inferring Verified Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3.1 Feature Importance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.3.2 Clustering and characterization . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4 Comparative Topical Usage Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.4.1 Content Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.4.2 Topical Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

List of Figures

Figure Page

1.1 A polarizing Tweet from a suspected troll account. . . . . . . . . . . . . . . . . . . 21.2 Impersonation Attempts on Jennifer Lawrence, a Celebrity Well Known for not

Possessing an Official Social Media Handle. . . . . . . . . . . . . . . . . . . . . . . 31.3 Influential UK Political Accounts that Remain Unverified Despite Confirmed Ver-

ification Requests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Twitter’s Public Backpedal on their Verification Stance. . . . . . . . . . . . . . . . 51.5 Actor Michael Ian Black Questioning the Verification of Jason Kessler. . . . . . . 61.6 The Suspended Twitter Verification Request Form. . . . . . . . . . . . . . . . . . . 7

2.1 ADASYN Class Imbalance Correction with Synthetic Samples Mirroring MinorityClass Density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 SMOTETomek Class Imbalance Correction with Tomek Links Highlighted. . . . 15

3.1 Distribution of Friends, Followers, Public List Memberships and Tweet Activity. . 19

4.1 Log-Log Scaled Distribution of Proportion of Users to Out-Degree. . . . . . . . . 284.2 Log Scaled Distribution of Number of Node Pairs vs Degrees of Separation. Despite

Average Distances being Low, a Large Number of Pairs Exceed The PreviouslySpeculated Six-Degrees of Separation. . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Wordcloud of Most Frequent Unigrams in Bios of Verified Users. . . . . . . . . . . 314.4 Log-Log Scaled Scatter Plots of Various Influence Measures. The Regression

Splines and 95% Confidence Intervals are computed Using a Generalized AdditiveModel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5 Calendar Maps for Verified User Tweet Activity Levels Over our One-Year Collec-tion Period. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1 Normalized density estimations of the six most discriminative features for verified(blue) and non-verified users (red). . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 t-SNE embeddings of accounts coloured by cluster. The distribution of verificationprobabilities by cluster, as predicted by our classifier, are faceted on the right. . . 45

5.3 Normalized density estimations of usage for the six most discriminative topics forverified (blue) and non-verified users (red). Listed alongside are the top three mostprobable keywords for each topic. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.4 Square-root scaled proportion of users by optimal number of topics. . . . . . . . . 50

xiv

List of Tables

Table Page

3.1 List of features extracted per user by our framework. . . . . . . . . . . . . . . . . 21

4.1 Most Popular Bigrams in Bios of Verified Users. . . . . . . . . . . . . . . . . . . . 324.2 Most Popular Trigrams in Bios of Verified Users. . . . . . . . . . . . . . . . . . . . 32

5.1 Summary of classification performance of various approaches using metadata,temporal and contextual features on the original and balanced datasets. . . . . . . 42

5.2 Classification performance of our most competitive model broken down by cluster. 465.3 Summary of classification performance of various approaches on inferred topics. . 47

xv

Chapter 1

Introduction

Chapter Guide

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 A Brief and Controversial History of Twitter Verification . . . . . . . . . . . . . . . . 31.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Thesis Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1

1.1 Overview

We live in an era when our news diet is inundated with discourse about the perils of fake news,impersonation and networks of autonomous social media agents trans-nationally affecting elections -and for good reason. Social media has increasingly lent itself as an amenable means for malicious stateand non-state actors to usher in societal changes desirable to them. These fears were confirmed in fullpublic view when international attempts in using Twitter and Facebook to steer public opinion during the2016 US elections became apparent [50, 98] and the US Department of Justice charged 13 defendantsfor the same [23].

The efforts of international state actors spanned the gamut of polarising opposing voting blocks,partitioning established voting blocks, suppressing minority election turnout and the like. The effortsincluded hundreds of thousands of posts with pro-Afro-American hashtags in similar Facebook groupsdesigned to highlight the racial divide in the nation. Supplementing this effort, was the creation ofmultiple popular hashtags and groups on social media masquerading to be of Afro-American origin andusing these assets to misdirect voters about voter registration, whom to vote for as well as candidatepolicies [99].

Figure 1.1: A polarizing Tweet from a suspected troll account.

Additionally, the effort also attempted to arouse conservative support from voting blocks whereTrump was expected to underperform, such as devoutly religious Evangelicals and veterans. This wassuccessfully done by generating an apocalyptic narrative regarding delinquent minorities targeting thepolice forces, who were in turn hamstrung by the Democrats in their options to protect themselves,hijacking the hashtag #bluelivesmatter in the process. These campaigns are further complicated by thepresence of ancillary misinformation designed to instil doubt in people and lend credence to otherwiseincredulous claims. An example of this was the #pizzagate hashtag, alleging that a pizzeria in Wash-ington state was fronting a child trafficking operation of an international scale, led by the Democraticnominee. This, in turn, incited death threats towards the owner of the pizzeria as well as violent vigi-lantes looking to exact extra-judicial justice[53].

In addition to the pertinent fear of misinformation, users today also need to account for influencecampaigns where networks of connected autonomous agents make a concerted effort to make an un-popular candidate or point of view appear more widely supported than it is. This apparently inflatedsupport exploits the endorsement heuristic that has been proposed in prior work relevant to the heuris-tic model for content consumption [16, 105], wherein users online resort to simplifying heuristics toquickly summarise and draw inferences from content online when content is abundant [42]. These as-

2

https://twitter.com/hashtag/bluelivesmatter

https://twitter.com/hashtag/pizzagate

troturfing campaigns, as they are named, attempt to make unpopular content appear viral by means ofretweet farms on Twitter with the hope being that when an adequate threshold of virality is surpassed,people perceive it as being credible and in some extreme cases, domain experts on the platform feel thatit is appropriate to retweet it themselves [33].

Therefore, it comes as no surprise that there has been extensive research into the defining attributesof fake news on Twitter [124], its spread [100] and users exposure to it [41]. This increased socialcurrency associated with misinformation has necessitated a means of verifying content provenance onplatforms like Twitter. Most social media platforms, including the popular ones like Twitter, Facebookand Instagram, have responded to this need with a two-pronged approach of account verification andcontent moderation. It into the inner workings of first of the aforementioned measures - verification -that this thesis delves into.

1.2 A Brief and Controversial History of Twitter Verification

(a) Imposter Accounts on a JenniferLawrence Instagram Search

(b) An Example of an Imposter Ac-count Posting only Public Photos

Figure 1.2: Impersonation Attempts on Jennifer Lawrence, a Celebrity Well Known for not Pos-sessing an Official Social Media Handle.

Going hand-in-hand with the aforementioned problem of fake content and misattribution of infor-mation on Twitter, is the likelihood of celebrities and authorities being impersonated on the platformopening up opportunities for trolls, and hired agents to peddle false information in an authoritative man-ner, as well as lending credence to content posted online by other malicious actors, by retweeting itthrough an imposter account. The landscape is complicated by the possibility of further defamation

3

due to imposter accounts peddling petty financial scams as well as messaging unsuspecting fans forunsavoury encounters as evidenced by abundant tweets under the #scammers hashtag.

(a) A Highly Followed Unofficial NHS Staff Twitter Handle

(b) The Popular British Election Handle

Figure 1.3: Influential UK Political Accounts that Remain Unverified Despite Confirmed Verifica-tion Requests.

It is with this backdrop of celebrity impersonation that Twitter started its account verification programback in 2009 when it was sued by sports manager Tony La Russa, who didnt appreciate an imposteraccount passing sports judgements in his name. Twitter executive Biz Stone also confirmed that criticismon the same by Kanye West had a hand in the matter. However, in the initial rollout of the verificationprogram, the public got its first glimpses that the manner in which Twitter had rolled out the programmay be misunderstood. They stated that the presence of a verified badge implied that

“we’ve been in contact with the person or entity the account is representing and verifiedthat it is approved”

4

https://twitter.com/hashtag/scammers

However, due to human resource constraints, the verification process was started from a very selec-tive seed set of accounts and users deemed pre-eminent enough to impersonate. The original statementalso slightly backtracked towards the end saying that the absence of a verified badge did not imply thatan account or its associated content was fake, adding that the vast majority of unverified accounts onTwitter are genuine. If verification was meant as a universal indicator only of authenticity, this selec-tive approach was bound to be misinterpreted. After the beta phase of this system, Twitter released aFAQ [113] stating that the platform

“proactively verifies accounts on an ongoing basis to make it easier for users to find whotheyre looking for and does not accept requests for verification from the general public”

This selective and subjective roll out opens the door for doubt regarding how are these impersonationworthy users determined. The most natural proxy for measuring influence on Twitter can be seen as thenumber of followers a user has. However, the Twittersphere is abounding with examples of genuinelyinfluential political and business accounts that have a high following but are yet unverified, despiterepeated requests to Twitter for the same [49]. The subtext of these incidences compounded to giveusers the impression that the verified badge was not just a marker of authenticity but meant somethingmore. Just as this point of view was gaining popularity, Twitter executive Jack Dorsey attempted todouble down on its initial stance, but further complicated perceptions by stating that an account is onlygranted verification

“if it is determined to be of public interest and verification does not imply an endorsement”

However, this confusion was apparent by late 2016, when Twitter denied multiple well-known right-leaning public figures a verified badge, eventually extending the same treatment to Julian Assange.Despite his work releasing state secrets being illegal as well as controversial, the truth remained thatthe handle @JulianAssange belonged to the man himself. Similarly, Twitter revoked notable provoca-teur Milo Yiannopoulos’ verified status after flagging some of his content as offensive. Incidents likethese provided strong evidence contravening Twitters official stance on verification while cementing theperception that verification was indeed a marker of prominence and a form of endorsement.

Figure 1.4: Twitter’s Public Backpedal on their Verification Stance.

5

https://twitter.com/julianassange_

Matters reached a head when several left-leaning celebrities on the platform publicly called out theCEO for granting verified status to political extremists and white supremacists such as Jason Kesslerand openly threatened to quit the platform, in a Tweet which received over 11,000 retweets. This wasan open demonstration of how the general Twitter user had, over time, come to assign a vastly differentmeaning and value to the verified badge compared to what Twitter had first imagined. This realisationforced Twitter into a quick and humiliating public backpedal, wherein they effectively admitted that therollout of the verification system had been faulty, in a manner which allowed users to assign credenceand authority to the authenticity the badge was meant to confer, in extreme cases enabling the owner topeddle it as a marker for expertise or importance.

Figure 1.5: Actor Michael Ian Black Questioning the Verification of Jason Kessler.

In addition to not quickly and clearly clarifying what the verified badge meant, the verification badgewas grossly misinterpreted due to the following additional confounding factors:

• Being verified on Twitter makes attaining verification on other platforms more likely

• Likely lends additional credibility to an account and helps acquire reach on the platform

• Verified accounts have access to special privileges such as additional analytics as well as theoption of a separate dashboard summarising interaction with other verified users

In light of the aforementioned factors, it is no surprise that a verified badge on Twitter is as highlysought after as it is misunderstood. In an age where celebrities and brands vie to quantify and maximiseauthentic online reach [5, 10, 15, 58, 68, 69], attaining verified status on Twitter is a significant brandingcoup. Being cognizant of this, Twitter permanently suspended its requests for verification form shortlyafter its CEO Jack Dorsey briefly alluded to wanting to open up the process.

Having established that a verified status can make a substantial difference to the range and quality ofengagement online, it is worth investigating means and methods to acquire it in a cost-effective manner.

6

Figure 1.6: The Suspended Twitter Verification Request Form.

In a world where social media management solutions can cost as much as 9,000$ per month [28],possessing insights that help a user acquire verification can make a world of difference to the qualityof the average Joes Twitter presence, without resorting to aforementioned solutions. This thesis alsodelves into the factors that set verified users apart and by unravelling factors that predict verificationstatus, uncovers actionable insights that can be leveraged by the lay Twitter user.

1.3 Challenges

An overview of the challenges faced when dealing with the question of characterising and discerningverified users on Twitter are as follows:

• Extracting the verified user network on Twitter is a lengthy undertaking with the unfiltered net-work spanning nearly 298,000 nodes and 521 million edges

• Attempting an unbiased sampling of non-verified users on the platform controlling for an appro-priate measure of public interest so as to prevent obvious candidates who were likely part of theverification seed set from corrupting the analysis

• Dealing with substantial class imbalance due to the fact that verified users form only a smallfraction of total users on the platform and rectifying the said imbalance in a manner amenable touse with a variety of classifiers

• Identifying the right metrics and statistics that, in combination allow us to capture all aspects of ausers Twitter presence

7

• Capturing intangible aspects of a users behavioural patterns such as the topics they Tweet about,discourse diversity, the recent trajectory of activity and reach on the platform, etc

• Understanding the relationship between verification status and influence on the Twitter platform

1.4 Contributions

A summary of the contributions in this work are as follows:

• Fully Featured Dataset: We release a fully-featured dataset of 407k+ users, containing 79+million edges and 494+ million time-stamped Tweets.

• Successful Classification: We are the first study to successfully attempt at discerning as well asclassifying verification worthy users on Twitter. We obtain a near-perfect classifier in the process.

• Actionable Findings: We unravel the aspects of a profiles activity and presence that have themost demonstrable bearing on a users verification status.

1.5 Thesis Layout

The organisation of the rest of this thesis is outlined in this section. In Chapter 2, the involvedalgorithms used to arrive at our inferences are detailed while in Chapter 3 we detail the data that wascollected and motivate the reason for its inclusion as well as the collection methodology. In Chapter 4we attempt to characterise verified users using established tools and strategies previously used on theentirety of the Twitter network while Chapter 5 attempts to classify verification status from other aspectsof a users Twitter presence while uncovering the most salient factors towards the said task. Finally weconclude with Chapter 6.

8

http://precog.iiitd.edu.in/requester.php?dataset=twitterVerified19

Chapter 2

Background and Methods Used

Chapter Guide

2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Tests for Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Tests for Auto-correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Changepoint Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Inferring Heavy-Tailed Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.1 Power Law MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Automation Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Node Importance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.1 Betweenness Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.2 PageRank Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Fixing class imbalance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.1 ADASYN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.2 SMOTETomek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.7 Linguistic Inquiry and Summary Scores . . . . . . . . . . . . . . . . . . . . . . . . . 15

9

2.1 Overview

In the interest of making this thesis self-contained for the interested or casual reader, this chapteraims to briefly delineate the methods used to infer the results we outline in later chapters. Contingenton the reader’s level of authority, this chapter may be skipped.

2.2 Time Series Analysis

Detailed below are the methods used in the tweet activity level time series analysis in Chapter 4.

2.2.1 Tests for Stationarity

The Augmented Dickey-Fuller [37] test checks for the presence of stationarity in a time series byattempting to reject the null hypothesis that the time series sample possesses a unit root (i.e. one of theroots of its characteristic equation is one).

The presence of a unit root creates problems in several inference methods, and it is hence that this isone of the first diagnostic tests conducted on a time series sample. The presence of a unit root precludesstationarity; however, a non-stationary time series may not possess a unit root.

The test attempt to fit the time series into the following model:

∆yt = α+ βt+ γyt−1 + δ1∆yt−1 + · · ·+ δp−1∆yt−p+1 + εt (2.1)

The formulation includes lags of up to order p, enabling the model to account for higher-order auto-regressive processes. The test relies on the checking against the null hypothesis of γ from Equation 2.1being zero using the following test statistic:

DFτ =γ

StandardError(γ)(2.2)

If the aforementioned test statistic is lesser than the critical value, then it is taken as strong evidencefor the presence of stationarity.

2.2.2 Tests for Auto-correlation

The Ljung Box [70] test is used for testing whether any group of auto-correlations of a time seriesare significantly different from zero. This falls in the category of portmanteau tests as its alternativehypothesis includes several individual hypotheses as, instead of testing against an alternative hypothesisof a specific lag order, it tests against the alternative hypothesis of general serial correlation at any lagorder or groups of lag orders.

The null and alternative hypotheses of the test are as follows:H0 : The data is independently distributed with no observed auto-correlation at any lag order.Ha : The time series data exhibits serial correlation and is hence, not independently distributed.

10

The test statistic is:

Q = n(n+ 2)

p∑k=1

ρ2pn− k

(2.3)

where n is the sample size, p is the maximum allowed lag order and ρp is the sample auto-correlationat order p. The null hypothesis is rejected at significance level α if the test statistic mentioned inEquation 2.3 exceeds the chi-squared critical threshold of χ2

1−α,p.

2.2.3 Changepoint Detection

Changepoint detection refers to the process of identifying time periods when the generative processbehind a time series changes, thus producing a marked change in the probability distribution of thegenerated samples. This can be traced using the changes in series moments or spectral density. However,in most cases, testing limits itself to mean and variance changes.

We use the pruned version of the optimal partitioning algorithm called Pruned Exact Linear Time(PELT) [54]. The rationale behind this choice has to do with the fact that this method runs faster thanother exact methods, while also retaining optimally under mild deviations from the assumptions of themodel. Additionally, the method retains its statistical power under most variations of the data.

Consider a time series of n time steps, which is to be partitioned into m+ 1 intervals by m change-points. The cost-based formulation for finding the optimal changepoints relies on minimizing the fol-lowing objective function:

m+1∑i=1

[C(yτi−1+1:τi)] + βf(m) (2.4)

whereC is an appropriately selected cost function (usually the negative log-likelihood of an assumedgenerative distribution generating a sequence), y is the time-series sequence, τi is the ith changepointand βf(m) is a regularizing term (usually linear) that prevents overfitting. Finding an assignment ofsections that minimizes the objective in Equation 2.4 is possible in quadratic by a recursive methodthat goes backwards in time and always conditions on the last selected changepoint. The method thenfocuses on the segment of the series between the start and the last selected changepoint. PELT furtherimproves on this by pruning the recursive search space and eliminating segment candidates that cannotminimize the stated objective, thus improving the runtime to amortized linear time.

2.3 Inferring Heavy-Tailed Distributions

Detailed below are the methods used in concluding that the out-degree distribution of verified userson Twitter follows a power law in Chapter 4.

11

2.3.1 Power Law MLE

If a quantity x follows a power law, its probability distribution roughly respects the following rela-tionship:

p(x) = C

(x

xmin

)−α(2.5)

where α is the scaling parameter, C is the normalizing constant and xmin is the minimum value inthe distribution where the relationship is observed.

The old established method of testing for the presence of power laws in distributions involved check-ing linearity in the log-log plot of frequency per probability bin and bin value. However, this methodhas several statistical limitations:

1. It does not enforce the PDF of the distribution to be normalized to one over the range of values inour interest.

2. It does not allow for the possibility of a range of the data to be best explained by power laws.

The modern, statistically sound methodology proposed by Clauset et al. [21] involves a two-stepprocedure for estimating xmin and subsequently α. The optimal xmin is estimated by selecting thevalue that minimizes the Kolmogorov distance between CDF of the remaining data S(x) and that of thepower-law that best approximates the data in that range P (x), as shown in Equation 2.6.

minx≥xmin

|S(x)− P (X)| (2.6)

Once the optimal xmin has been inferred, the value of α is inferred next using Maximum LikelihoodEstimation. The closed-form solution is detailed in Equation 2.7.

α = 1 + n

[n∑i=1

lnxixmin

]−1

(2.7)

2.3.2 Likelihood Ratio Test

In the interest of being sure that no other heavy-tailed distribution better explains our data, we con-ducted likelihood ratio [117] tests viz-a-viz the Weibull, log-Normal and Poisson distributions. Wedetail the procedure below.

Consider two candidate distributions p1(x) and p2(x), with the likelihoods of the dataset being gen-erated by the two distributions being

L1 =

n∏i=1

p1(xi) and L2 =

n∏i=1

p2(xi) (2.8)

The log-likelihood ratio R, as stated in Equation 2.9 thus leads us to our z test statistic.

12

R =

n∑i=1

[ln(p1(xi))− ln(p2(xi))] (2.9)

The z test statistic is detailed by

R√NωN

(2.10)

where ω2N is the sum of squares of the pointwise log-likelihood ratios. In case the test statistic

exceeds the (1− α) quantile of the normal distribution, it is considered to be strong evidence in favourof p1 while if the test statistic is more negative than the α quantile of the normal distribution, it isconsidered strong evidence in favour of p2.

2.4 Automation Scores

In order to capture automation signatures of every account in our analysis and to uncover any bearingsuch signatures may have on the probability of a user being verified, we used a third party resourcecalled Botometer [34]. This application returns the automation probability of an account with respectto various feature groups categorized as Network, User, Temporal and Content. We extract these fourgroup scores as well as the complete automation probability summary score output by the random forestmodel trained with 10-fold CV. The feature group scores are also computed from the individual featureimportances of each feature in that group as output by the random forest model.

2.5 Node Importance Measures

Detailed below are the methods used in inferring the centrality to discovery and information flow, ofa user, within the Twitter verified user network. This is discussed in Chapter 4.

2.5.1 Betweenness Centrality

Betweenness centrality of a node denotes the portion of pairwise shortest paths that contain the node.This is particularly useful in estimating centrality to information flow on a network as most informationdissipates from one node to another along their pairwise shortest path [65].

2.5.2 PageRank Centrality

PageRank seeks to quantify the influence of a node in a network based on the recursive insight thatconnections to influential nodes in the network should count more than connections to low influencenodes.

The PageRank centrality of a node i is recursively defined based on the PageRank centrality of itsneighbours j ∈ neighbours(i) as follows

13

PR(it+1) =1− dN

+ d∑j

ajiPR(jt)∑

i aji(2.11)

where d denotes the damping factor and aij denote elements of the adjacency matrix. This recursiveformulation can be solved by a method called Power Iteration wherein iterative applications of thetransition or adjacency matrix multiplication to a random distribution vector are performed until thedistributions either reach or approach stationarity.

2.6 Fixing class imbalance

Detailed below are the methods used in rectifying the skewed class distribution between verified andnon-verified users sampled in our dataset, the collection of which is outlined in Chapter 3.

2.6.1 ADASYN

Adaptive synthetic [44] sampling (ADASYN) works on the guiding principle that the best way torestore class balance in a problem is to create synthetic samples on the minority class that reflect thepreexisting density distribution of the minority class as shown in Figure 2.1.

Figure 2.1: ADASYN Class Imbalance Correction with Synthetic Samples Mirroring MinorityClass Density.

14

Let D be a dataset of m data points with ml and ms being the number of data points in the majorityand minority classes respectively. The number of data points needed to be generated to attain balancelevel β (β is one for perfect balance) is as follows

G = (ml −ms)β (2.12)

For each sample in the minority class xsi , rsi denotes the proportion of majority class samples in itsK nearest neighbours. This is further normalized across all minority class points so as to approximatethe class density disparity in the neighbourhood of all minority class points as follows

rsi =rsi∑msi=1 rsi

(2.13)

Finally, the method synthetically generatesGrsi minority class samples for each minority class pointusing random interpolation with one of its K(1− rsi) nearest neighbours that belong to the same class.

2.6.2 SMOTETomek

The SMOTETomek [59] approach follows a hybrid philosophy of generating minority samples aswell as eliminating a few uninformative majority samples. The first part is carried out by an interpo-lation method called SMOTE, which is a simplified alternative of the ADASYN method discussed inSubsection 2.6.1 without the density estimation. The majority class undersampling targets uninforma-tive samples using a method called Tomek links. These are pairs of points of opposing classes whichare each other’s closest neighbours. In these pairs, the majority class sample is eliminated.

Figure 2.2: SMOTETomek Class Imbalance Correction with Tomek Links Highlighted.

2.7 Linguistic Inquiry and Summary Scores

We have made use of linguistic inquiry tools[89] to capture abstract aspects of user content. Theseare captured mainly by the four LIWC summary variables - Clout, Analytic, Authentic and Tone.

15

The Analytic summary variable is derived from eight function word dimensions which capture theextent to which hierarchical, logical and functional thinking patterns are represented in a person’s writ-ing [90]. This is usually anti-correlated with the presence of narrative and personal language. The Cloutsummary statistic intends to capture the extent to which initiative, social influence and confidence inexpression is present within a user’s content [52]. Analogously, the Authentic summary variable triesto capture honesty expressed in content usually by means of vulnerability, deference or humility [87].Lastly, the Tone summary score attempts to condense positive and negative emotion scores into a singlevariable [22].

16

Chapter 3

Dataset

Chapter Guide

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 User Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Content Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 Temporal Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.5 Miscellaneous Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.6 Rectifying Class Imbalance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

17

3.1 Overview

In this chapter, we present details of our dataset and the data collection process, along with a sum-mary of the diverse array of collected features. Accompanying the list of features we sought to collect isthe rationale behind why they were deemed of sufficient importance, including notable past use cases.

The @verified handle on Twitter follows all accounts on the platform that are currently verified. Wequeried this handle on the 18th of July 2018 for IDs of interest.

3.2 User Metadata

The @verified handle on Twitter follows all accounts on the platform that are currently verified. Wequeried this handle on the 18th of July 2018 and extracted the IDs of 297,776 users who were verifiedat the time. We further focused our work to the subset of users who had English listed as their profilelanguage thus enabling us to focus on the single largest linguistic demographic on the platform [82] andleaving us with 231,235 English verified users. For each verified user, we also queried the API in orderto obtain the list of outlinks or friends of users that belonged to the aforementioned English subset. Wefiltered this list of friends and retained only those nodes that were leading to other verified users, thusobtaining the internal network existing among the verified users. The final network was an extremelysparse one with a density of 0.00148, and 231,235 English verified users, having 79,213,811 directed-edges between them. The network consists of only 6,027 isolated users with an average out-degree of342.55 and a maximum out-degree of 114,815. The network had a notable giant strongly-connectedcomponent of 224,872 users, which accounts for 97.24% of the total English verified users. In all, thenetwork contains 6,251 connected components.

In the interest of verifying Twitter’s assertion that the likeliness of a handle’s verification is commen-surate with the public interest in that handle and nothing else [112, 113], we sought to obtain a randomcontrolled subset of non-verified users on the platform. Pursuant to this need, we leveraged Twitter’sFirehose API – a near real-time stream of public tweets and accompanying author metadata – in orderto acquire a random set of 284,312 non-verified users, controlling for a conventional measure of publicinterest, by ensuring that the number of followers of every non-verified user obtained was within 2%that of a unique, verified user that we had previously acquired.

Twitter provides a REST Application Programming Interface (API) with various endpoints that makedata retrieval from the site in an organized manner easier. We used the REST API to acquire profilemetadata of the user handles obtained previously, including account age, the number of friends, follow-ers and tweets. Additionally, we obtained the number of public Twitter lists a user is a part of and thehandle’s profile description. Metadata features extracted from user profiles have previously been usedfor classifying users and inferring activity patterns on Twitter [80, 125]. We again further focused ourwork to the subset of users who had English listed as their profile language thus enabling us to focuson the largest linguistic demographic on the platform [82] and leaving us with 231,235 English verifiedusers and 175,930 non-verified users. The distribution of various metadata are depicted in Figure 3.1.

18

https://twitter.com/verified

https://twitter.com/verified

(a) Log Scaled Number of Users vs Friends. (b) Log Scaled Number of Users vs Followers.

(c) Log Scaled Number of Users vs List Memberships. (d) Log Scaled Number of Users vs Status Count.

Figure 3.1: Distribution of Friends, Followers, Public List Memberships and Tweet Activity.

19

3.3 Content Features

Utilizing Twitter’s Firehose API, we retrospectively acquired all tweets authored by the aforemen-tioned users over a one year collection period spanning from 1st June 2017 to 31st May 2018. In total,our collection process acquired roughly 494,452,786 tweets. The tweet texts were retained, and anyaccompanying media such as GIFs were deemed surplus to requirements and discarded.

From the text, we extracted linguistic and stylistic features such as the number and proportion ofPart-Of-Speech (POS) tags, effectively obtaining a user’s breakdown of natural language componentusage. Work demonstrating the importance of content features in location inference [72], tweet classifi-cation [7], and network characterization [67] further led us to extract the frequency of hashtags, retweets,mentions and external links used by each user. Prompted by studies showing that the deceptiveness oftweets could be inferred from the length of sentences constituting them [3], we computed additionalfeatures including average words per sentence, average words per tweet, character level entropy andfrequency and proportion of long words (word length greater than six letters) per user.

In the interest of better discerning the emotions conveyed by the tweets authored by a user and re-sponses they may evoke in the potential audience, sentiment analysis presented itself as a useful tool.Sentiment gleaned from Twitter conversations has been used to predict financial outcomes [9], electoraloutcomes [2] as well as the ease of content dissemination [34]. We used Vader [48], a popular socialmedia sentiment analysis lexicon, which has previously been widely used in a plethora of applica-tions ranging from predicting elections [2, 92] to forecasting cryptocurrency market fluctuations [103].We extracted positive, negative and neutral sentiment scores and an additional fourth compound score,which is a nonlinear normalized sum of valence computed based on established heuristics [121] and asentiment lexicon. All four scores are computed per user, weighted by tweet length.

3.4 Temporal Features

Existing research suggests that temporal features relating to content generation and activity levels onTwitter can be used to infer emergent trending topics [14] as well as influential users [64].

Leveraging the Twitter Firehose, we gathered fine-grained time series of user statistics including thenumber of friends, followers and statuses, thus permitting us to compute their averages over our oneyear collection period. Furthermore, positing that a user’s likelihood of verification may be predicatedon how ascendant their reach in the platform is, we compute the proportion of friends and followersgained over the last one month and the last three months of our collection period. Additionally, similartrajectory encoding features are computed for tweet activity levels over the aforementioned one andthree-month windows, and the average time between statuses is extracted using the status count timeseries on a per user basis.

20

Use

rM

etad

ata

Number of followers

Tem

pora

lFea

ture

s

Average number of followers last yearNumber of friends Average number of friends last yearNumber of statuses Average number of statuses last yearNumber of public list memberships Proportion of followers gained in last 3 monthsAccount age Proportion of friends gained in last 3 months

Proportion of statuses generated in last 3 monthsProportion of followers gained in last 1 monthProportion of friends gained in last 1 monthProportion of statuses generated in last 1 monthAverage duration between statuses

Con

tent

Feat

ures

Number of POS tags1

Mis

cella

neou

sFea

ture

s

LIWC analytic summary scoreFrequency of POS tags1 LIWC authentic summary scoreAverage number of words per sentence LIWC clout summary scoreAverage number of words per tweet LIWC tone summary scoreCharacter level entropy Botometer complete automation probabilityProportion of long words2 Botometer network scorePositive sentiment score3 Botometer content scoreNegative sentiment score3 Botometer temporal scoreNeutral sentiment score3 Tweet topic distribution4

Compound sentiment score3

Frequency of hashtagsFrequency of retweetsFrequency of mentionsFrequency of external links posted

Table 3.1: List of features extracted per user by our framework.1 Part Of Speech (POS) tags include nouns, personal pronouns, impersonal pronouns, adjectives,

adverbs, verbs, auxiliary verbs, prepositions and articles.2 Long words are defined as words longer than 6 letters.3 Sentiment scores are weighted overall tweets of a user by tweet length.4 Scores over 100 topics are extracted from the tweets.

3.5 Miscellaneous Features

Attempting to capture qualitative cognitive and emotional cues from a user’s tweets, we acquired thefour LIWC 2015 [89] summary statistics named Analytic, Clout, Authentic and Tone for each user inour dataset. The summary dimensions indicate the presence of logical and hierarchical thinking patterns,confidence and leadership, personal cues and emotional tone, respectively, in the tweets of a user. LIWCcategories have been scientifically validated to perform well in determining affect on Twitter [27, 116]and have been previously used to detect sarcasm [39] and for mental health diagnoses from Twitterconversations [24].

21

Furthermore, positing that accounts perceived as being entirely or partially automated may havea harder time getting verified, we leveraged Botometer – a flagship bot detection solution [115] thatexposes a free public API. The system is trained on thousands of instances of social bots, and thecreators report AUC ROC scores between 0.89 and 0.95. Botometer utilizes features spanning thegamut from network attributes to temporal activity patterns. Additionally, it queries Twitter to extract300 recent tweets and publicly available account metadata and feeds these features to an ensemble ofmachine learning classifiers, which produce a Complete Automation Probability (CAP) score, whichwe acquire for every user in our dataset. We also augment our dataset with the temporal, network andcontent category automation scores for each user.

Finally, we also look to glean into the topics about which users tweet. Topic modelling has beeneffectively used in categorizing trending topics on Twitter [129] and inferring author attributes fromtweet content [75]. To this end, we ran the Gibbs sampling based Mallet implementation of LatentDirichlet Allocation (LDA) [74] setting the number of topics to 100 with 1000 iterations of sampling.Although such a topic model could be applied on a per tweet basis and subsequently aggregated by user,we find this approach to not work very well as most tweets are simply a sentence long. To overcomethis difficulty, we follow the workaround adopted by previous studies by aggregating all the tweets of auser into a single document [47, 123]. In effect, this treatment can be regarded as an application of theauthor-topic model [104] to tweets, where each document has a single author.

3.6 Rectifying Class Imbalance

Focusing our analysis on the Twitter Anglosphere left us with a substantially skewed class distribu-tion of 231,235 verified users and 175,930 non-verified users in our dataset. In keeping with existingresearch on imbalanced learning on Twitter data [43, 81], we used a two-pronged approach to rectifythis – a minority over-sampling technique named ADASYN [44] which generates samples based on thefeature space of the minority examples and a hybrid over and under-sampling technique called SMOTE-Tomek which in addition to generating minority class samples synthetically also eliminates samples ofthe over-represented class [59] and has been found to give exemplary results on imbalanced datasets[6].The members of the overrepresented class are eliminated using a measure called Tomek links. Pairs ofpoints of opposing classes are found which are each other’s closest neighbours in the feature space fol-lowing which the majority class member in the pair is eradicated. Augmenting our classifier’s trainingdata in the previously mentioned manner allowed us to attain near-perfect classification scores.

3.7 Conclusion

The diverse and extensive set of features collected aim to capture all the aspects of a user’s onlinepresence including interaction tendencies, the evolution of activity, reach on the platform, communitiesthey are a part of, topics of interest, etc. Aiming to obtain a complete representation of users on Twitter

22

is intended to give us the best possible chance of understanding what sets the verified users on theplatform apart - a question we investigate in detail in the coming chapters.

The data collected is classified and summarized in Table 3.1. We have anonymized and made thisdataset accessible to the public in a manner compliant with Twitter terms, once this work was published.

23


Chapter 4

Characterizing Verified Users

Chapter Guide

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.1 Basic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3.2 Degree and Eigenvalue Distribution . . . . . . . . . . . . . . . . . . . . . . . 274.3.3 Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.4 Degrees of Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.5 Verified User Bios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3.6 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Activity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

24

4.1 Overview

All major social networking websites including Twitter, Facebook, and Instagram, support the con-cept of verified accounts1, wherein users are independently authenticated by the platform. This status isusually conferred to accounts of well-known public personalities and businesses and is indicated with abadge next to the screen name (e.g., on Twitter and on Facebook).

For most of its existence, Twitter’s own verification process has been an opaque one punctuated bysporadic attempts to open up the process to the general audience. Their verification policy [112] statesthat an account is verified if it belongs to a personality or business deemed to be of sufficient publicinterest in diverse fields, such as journalism, politics, sports, etc. The exact intricacies of what theyconsider before verifying a handle are a trade secret. However, characterizing these users and gleaninginto the ways in which the Twitter sub-graph induced by verified user nodes differ from the wholeTwitter users network may yield insights into what discriminates verified users from non-verified ones.

4.1.1 Motivation

Despite repeated claims by Twitter that verification is not equivalent to accreditation, literature fromsocial sciences [83] and psychology suggests [35] that the presence of a verified badge can add furthercredibility to the tweets made by a user handle. In addition, prior psychological tests [36] have alsorevealed that the credibility of a message and its reception is influenced by its purported source andpresentation rather than its pertinence or credulity. Existing work [32] also indicates that widely en-dorsed information originating from a well-known source is easier to perceive as trustworthy. Ownersof verified accounts are usually well-known, and their content is on an average more frequently likedand retweeted than that of the generic Twittersphere [102, 107].

Tweets pose a challenging scenario for credibility assessment owing to their limited length, negligiblecustomization of visual design, and the frenetic pace at which they are consumed - with an average userdevoting only three seconds of attention per tweet [25]. Users may often resort to heuristics in order tojudge online content. In [16, 105], heuristic-based models are presented for online credibility evaluation.Particularly relevant to this inquiry is the endorsement heuristic, which is associated with credibilityconferred to it (e.g. a verified badge) and the consistency heuristic, which stems from endorsements byseveral authorities (e.g. a user verified in one platform is likely to be verified on others).

In the presence of aforementioned evidence along with pervasive fake content in the Twittersphere,our work explores how possessing a verified status can make a difference in outreach/influence of abrand or individual in terms of the extent and quality. Characterizing these “elite” users in isolationrepresents the first step in understanding how to become one of them.

1The exact term varies by platform, with Facebook using the term “Verified Profiles”. However in the interest of consis-tency, all owner-authenticated accounts are referred to as verified accounts, and their owners as verified users.

25

The rest of the chapter is organized as follows. Section 4.2 details relevant prior work, hence puttingour work in perspective. In Section 4.3 and Section 4.4, we conduct network and activity analysis onverified users, respectively. We conclude with Section 4.5.

4.2 Related Work

In this section, we delve into previous work on social network analysis and verified accounts.

Kwak et al. [61] were among the first to study Twitter with the aim of understanding its role in theweb - whether it was better approximated as a traditional social networking site or a news source. Theyshow how Twitter is a powerful network that can be used to study online human behaviour and alsoreport the ways in which an online social network such as Twitter differs from human social networks.Castillo et al. [13] attempt to identify credible tweets based on a variety of profile features, includingwhether the user was authenticated by the platform or not. Morris et al. [83] examined factors thatinfluence profile credibility perceptions on Twitter. They found that possessing an authenticated status isone of the most robust predictors of positive credibility. Semertzidis et al. [97] looked into the content ofuser biographies on Twitter and studied the presence of homophily with respect to their topical content.

Chu et al. [18] used a slew of features to identify Twitter handles that are generating automatedcontent. One of the crucial elements for their analysis was the presence of a verification badge. Alongsimilar lines, Hentschel et al. [46] assert that most non-verified users on Twitter are within 7 degreesof separation of a verified user and a vast majority of spam handles are located within 7-10 degrees ofseparation from verified users. This finding is promising with respect to fighting spam on Twitter asit suggests a white-listing mechanism for maintaining a core of non-spam users who are within a fewdegrees of separation of verified users.

Java et al. [51] analyzed Twitter’s user base and activity-level growth. They studied the geographicaldistribution of users along with graph-based inquiries such as network reciprocity and region-localizedclustering coefficients. Wang et al. [123] uncovered influential handles on Twitter and proposed a metricas an improvement to the Topical PageRank employed by Twitter at the time. A positive correlation ofPageRank with conventional metrics of the extent of influence, such as the number of followers, hasbeen found for the entire Twittersphere [61]. However, such an attempt has not yet been made for thenetwork internal to the verified users on the platform. A large number of reciprocal connections on theTwitter network can be explained by homophily of topical interests. Whether verified users, too, formreciprocal network links based on the same underlying principle was yet to be explored.

Hence, to summarize, there exists a rich body of literature studying the characterization of users inthe entire Twitter network. However, none of them, to the best of our knowledge, have attempted tocharacterize the Twitter sub-graph induced by the verified users. To that end, we run a battery of testson the extracted network of verified users in order to uncover how this sub-graph behaves in comparisonto the entire Twitter network.

26

4.3 Network Analysis

We attempt to quantify how our network of verified users with English as their primary language,when considered in isolation, differs from the entire network. We analyze and compare our results toprevious work on the Twitter network in its entirety.

4.3.1 Basic Analysis

The extracted network graph exhibits a very low density but a high level of connectedness. Out ofthe 231,235 English verified user nodes only 6,027 are isolated, making the minimum out-degree 0,while a majority of the users belong to a single giant connected component. The greatest out-degree is114,815 for the handle of a social media influencer - @6BillionPeople, while the greatest in-degree is78,101 for the handle of the former US President - @BarackObama. The average in-degree and out-degree are 342.55. The low density of the verified user network, as mentioned in the previous section,is further confirmed by a low average local clustering coefficient of 0.1583. The network has a slightdegree dissortativity of -0.04 which is in contrast to the degree homophily formerly observed for theentire Twitter network [61] and social networks in general [76]. This suggests the existence of a largenumber of one-way relationships between prominent and semi-famous (medium degree) personalitieswhich is further reinforced by the presence of 6,091 attracting components (components in which ifa random walk enters, it never leaves) in the directed graph. At the core of these components lie fa-mous personalities (high in-degree users) who do not follow any other handle. These include handlesof popular culture outlets such as @ladbible, Hollywood screenwriters such as @MrRPMurphy, andworld-renowned spiritual gurus such as @SriSri. Figure 3.1 displays the distributions of certain usermetrics within the sub-graph of verified users.

4.3.2 Degree and Eigenvalue Distribution

Power-law is a key component in characterizing degree distribution of networks gathered from theworld wide web and other sizeable information sources. It is one of the early highlights in the studyof web-graphs and social networks [86]. This matured into a series of theoretical inquiries into thepresence of power-laws in other aspects of network structure such as eigenvalues of the Laplacian [20,78]. However, Kwak et al. [61] reported an absence of a power-law in the degree distribution, whenanalyzing the Twitter network as a whole. This stands in contrast to our findings of a power-law almostentirely accounting for the out-degree distribution in the network of verified users. It also falls in linewith existing work [95] that identifies the presence of emergent properties observed in sampled sub-graphs and not seen in the graph as a whole. Subsequent work [31] though has mostly confirmed thepresence of power-law in degree and Laplacian eigenvalue distributions of several synthetic and real-world undirected social network datasets.

We computed the out-degree distribution as well as the largest 10,000 eigenvalues of the Laplacianmatrix of the sub-graph. We discarded most of the smaller eigenvalues as the sparsity of our sub-

27

https://twitter.com/6BillionPeople

https://twitter.com/BarackObama

https://twitter.com/ladbible

https://twitter.com/mrrpmurphy

https://twitter.com/SriSri

Figure 4.1: Log-Log Scaled Distribution of Proportion of Users to Out-Degree.

graph resulted in most of those eigenvalues being close to zero, which could have caused floating-pointoperation issues while inferring power-law. The eigenvalues were computed using the power iterationmethod in existing solvers. For both these distributions, we seek to calculate the exponent α and a xmin

threshold, which represents the lower bound of the best-fit range.Inferring of power-law parameters α and xmin is done using the maximum-likelihood algorithm by

Clauset et al. [21]. This approach is considered more accurate as compared to the traditional methodof fitting the slope of the log-log plot. For the degree distribution, we use discrete maximum likelihoodestimate (MLE) while for the eigenvalue distribution, we use continuous MLE. We employ the particularimplementation by Nepusz [85], which uses the BFGS algorithm to infer the most likely parametervalues. Moreover, this method calculates a goodness-of-fit parameter p, based on the Kolmogorov-Smirnoff distance, that indicates whether the power-law fit is likely to be significant. This score is basedon a randomized procedure. If the value p > 0.1, then there is strong evidence that the presence of apower-law is justified.

Continuous MLE inference for the eigenvalues yields parameter estimates of 3.18 for α and 9377.26for xmin with a p value of 0.3, thus indicating a very strong fit. Discrete MLE inference for the degreedistribution yields parameter estimates of 3.24 for α and 1334 for xmin with a p value of 0.13 indicating asignificant fit. However, the closeness of the degree p value to the threshold of 0.1 prompts us to conduct

28

further pairwise tests to rule out other heavy-tailed distributions. We use an R toolbox [26] to performa Vuong’s likelihood-ratio test between a power-law fit and alternate candidates such as log-normal,poisson and exponential fits. In each case, the tests returned significantly high 2-3 digit likelihood-ratiovalues indicating that the power-law was, in fact, the heavy-tailed distribution that best approximated theout-degree distribution in our sub-graph. The relationship between out-degree values and the proportionof users possessing it can be seen in Figure 4.1.

4.3.3 Reciprocity

The reciprocity rate refers to the proportion of pairs of links that go both ways. Kwak et al. [61] havepreviously reported a reciprocity rate of 22.1% among the directed links in the entire Twitter network.The verified user network has a significantly higher reciprocity rate of 33.7%. This is still much lowerthan what is observed in other well-known social networks such as Flickr (68%) [19]. The likely causeof this is that entities like brands and third-party sources of curated and crawled information, whichtypically do not reciprocate engagements, are likely to be over-represented on Twitter. We conjecturethat the higher reciprocity rate viz-a-viz the whole Twitter graph is due to a larger core of publiclyrelevant and consequential personalities within this sub-graph. We leave validating this assertion forfuture work.

4.3.4 Degrees of Separation

Ever since Stanley Milgram’s seminal work on the “Six Degrees of Separation” [79], the concept ofusing the distribution of pairwise node distances to characterize a social network has become common-place. Watts et al. [122] coined their small-world model after finding that many social and technologicalnetworks possessed small average path lengths. Prior work [66] on an MSN messenger network of 180million users revealed a median separation of 6 and an effective diameter (90 percentile path length) of7.8.

The network of verified users on Twitter differs from the aforementioned networks as it is a networkwith directed edges. Thus, by notions of conventional graph-theoretic wisdom, one would expect theaverage path lengths to be higher as a path taken from a node to another need not be viable the other way.However, our analysis reveals that the average node distance to be 2.74, after omitting isolated nodes.Such a low number is especially surprising, given that the reciprocity rate is much lower comparedto even other directed networks like Flickr. This value is considerably lower than the value of 4.12reported for the general Twittersphere through a sampling mechanism [61]. The distribution of pairwisenode distances in the English verified sub-graph can be seen in Figure 4.2.

Later work [4] using a bounded bi-directional search approach, optimally found the value of theaverage shortest path length on Twitter to be 3.43. This is still considerably higher than that of theverified sub-graph and reinforces the finding that while the Twitter verified sub-graph is sparse in itsown right, it is still significantly denser than the whole of the Twitter graph at large.

29

Figure 4.2: Log Scaled Distribution of Number of Node Pairs vs Degrees of Separation. DespiteAverage Distances being Low, a Large Number of Pairs Exceed The Previously Speculated Six-Degrees of Separation.

4.3.5 Verified User Bios

Each user on Twitter can have a biography (or bio), allowing him/her to describe themselves usinga limited number of characters. We attempt to gain insights from some of the most popular unigrams,bigrams and trigrams occurring in the bios of verified users. We also filter out n-grams constitutedmainly of non-informative words.

The most frequent unigrams portray several underlying themes. They include cross-links to othersocial media accounts of an entity (“Instagram”, “Facebook” and “Snapchat”), personal descriptors(“Husband”, “Father” and “Gay”), professional descriptors (“Producer”, “Founder”, “Director”,“Tech”, “Author” and “Sport”), and terms relevant to businesses and brands online (“Booking”, “Sup-port”, “International” and “Official”). Some unigrams such as “American” and “London” also hinttowards the most dominant source of activity in the Anglospheric Twitter. Figure 4.3 illustrates a wordcloud of most frequent unigrams.

Bigrams and trigrams reiterate a largely similar narrative, dominated by generic descriptors (“Offi-cial Account” and “Official Twitter Page”), accomplishment descriptors (“Award Winning”, “Olympic

30

Figure 4.3: Wordcloud of Most Frequent Unigrams in Bios of Verified Users.

Gold Medalist” and “Best Selling Author”), professional descriptors (“Singer Songwriter” and “Profes-sional Rugby Player”), and business and community-related terms (“Report Crimes Here”, “Monday toFriday” and “Weather Alerts EN”). The most frequent bigrams and trigrams along with their respectivefrequencies can be seen in Table 4.1 and Table 4.2.

A running theme common to all three cases is the dominance of journalists and news and weatheroutlets. Several most frequent unigrams (“Journalist”, “Reporter” and “Editor”), bigrams (“BreakingNews” and “Anchor Reporter”), and trigrams (“New York Times”, “Wall Street Journal” and “Editorin Chief”) are apropos of journalism. Being a pre-eminent journalist in an English media outlet seemsto be one of the surest ways to get verified on Twitter.

4.3.6 Centrality

To gain a better understanding of verified users, we investigate how various centrality measurescorrelate with one another. These observations are illustrated in Figure 4.4. We study the relationshipbetween the number of tweets made by a user and his/her followers. We observed that, in this aspect, theEnglish verified sub-graph behaves exactly like the entire Twitter graph as previously reported in [61];the number of followers is seen trending upwards with an increase in the number of statuses and this

31

Table 4.1: Most Popular Bigrams in Bios of Verified Users.

Bigram OccurrencesOfficial Twitter 12166

Official Account 2788Award Winning 2270

Follow Us 2268Co Founder 1581

Husband Father 1540Opinions Own 1222New Album 1088

Singer Songwriter 1043Co Host 933

Latest News 904Breaking News 898

Anchor Reporter 855Rugby Player 799

Managing Editor 769

Table 4.2: Most Popular Trigrams in Bios of Verified Users.

Trigram OccurrencesOfficial Twitter Account 5457

Official Twitter Page 1774Weather Alerts EN 847

Emmy Award Winning 475New York Times 464Editor in Chief 461

Best Selling Author 296Professional Rugby Player 253

Wall Street Journal 252Professional Baseball Player 241

Report Crime Here 238Award Winning Journalist 223

For Customer Service 174Olympic Gold Medalist 174

Monday to Friday 174

32

(a) List Memberships vs Betweenness Centrality. ExhibitsNo Relationship for Low Values but Strong Correlationfor High Values.

(b) Follower Count vs Betweenness Centrality. ExhibitsWeak Correlation for Low Values but Strong Correlationfor High Values.

(c) List Memberships vs PageRank Centrality. ExhibitsStrong Correlation for All Values.

(d) Follower Count vs PageRank Centrality. ExhibitsStrong Correlation for All Values.

Figure 4.4

33

(e) Follower Count vs Status Count. Exhibits Strong Cor-relation for All Values.

(f) Follower Count vs List Memberships. Exhibits StrongCorrelation for All Values.

Figure 4.4: Log-Log Scaled Scatter Plots of Various Influence Measures. The Regression Splinesand 95% Confidence Intervals are computed Using a Generalized Additive Model.

trend becoming more apparent at higher extremes. Next, we look into the variation of the number ofpublic lists the user is a part of with respect to the number of followers of a user. List membershiphas been shown to be a robust predictor of influence and topical relevance on Twitter [101]. It hasshown a competitive performance in topically recommending influential users to follow. This is furtherreinforced by our observation that the number of followers a user has, almost exclusively trends upwardswith an increase in the number of list memberships. Diminishing returns set in only in the very upperechelons.

As claimed by Twitter’s guiding principle, a user is only verified when he/she is deemed to be ofsufficient public interest. We posit that Betweenness and PageRank centrality of a user in the sub-graph of English verified users can predict his/her reach in the overall network, such as the followerscount. On testing, we observed that public list membership and follower count in the entire Twitternetwork is positively correlated with PageRank and Betweenness of that user in the English verified usersub-graph. In particular, the relationship between PageRank and total followers and list membershipswere especially strong. Even though the correlation between follower count and Betweenness seemslukewarm at first, a strong relationship emerges at higher extremes. Hence, our findings demonstratehow strongly a user is embedded in the Twitter verified user network is highly predictive of their reachin the generic Twittersphere. The scatter plots, acquired regression splines, and confidence intervals areshown in Figure 4.4.

34

Figure 4.5: Calendar Maps for Verified User Tweet Activity Levels Over our One-Year CollectionPeriod.

4.4 Activity Analysis

Finally, we attempt to characterize the collective tweet activity time series of the network of Englishverified users. A calendar heatmap for the verified user tweet activity levels over our a collection periodof one year can be seen in Figure 4.5. We check for existing autocorrelations in the time series usingimplementations [96] of the Ljung-Box and the Box-Pierce portmanteau tests. These tests check fora deviation from the null hypothesis of no autocorrelation using a combination of lags, rather thanautocorrelation with respect to a specific lag. If the p values returned by the test are greater than 0.05,then the time-lagged correlation cannot be ruled out with a 95% significance level. We tested for a lagof up to 185 days so as to be able to account for any seasonal autocorrelation (quarterly or semi-annual).The Ljung-Box and Box-Pierce test results indicate a maximum p value of 3.81×10-38 and 7.57×10-38

respectively, thus strongly ruling out any lagged correlation. This countered our initial expectationsthat there would be a significant autocorrelation in a week’s lag given that activity rates on Sundays arereliably lower than those on weekdays. Evidence for the same can be seen in the calendar heatmap.

We next inquire whether the activity time series is stationary or not. Existing work on smaller socialnetworks [127], such as Gab, reveals that the activity time series drastically change in response to socio-political events occurring outside the network. Hence, we test for stationarity of the time series - whethera single unchanging distribution produces the series - using an implementation [96] of the Augmented

35

Dickey-Fuller test with both a constant term and a trend term. Again, we check for a lag of up to 185days so as to be able to account for any seasonal changes in distribution (quarterly or semi-annual). Forupwards of 250 observations (we have 366) the critical value of the test is −3.42 when using a constantand a trend term at the 95% significance level. If the test statistic value is more negative than the criticalthreshold, we reject the null hypothesis of a unit root and conclude the presence of stationarity. The“number of tweets” time series of the English verified users returns a test statistic of −3.86 which issignificantly more negative than the critical threshold, thus strongly suggesting stationarity.

We further confirm this finding using a time series change-point detection mechanism called PrunedExact Linear Time (PELT) [54]. We assume that this time series is drawn from a normal distribution,with mean and variance that can change at a discrete number of change-points. We use the PELT algo-rithm to maximize the log-likelihood for the means and variances of the changing underlying distributionwith a penalty for the number of change-points. Results from several runs of the algorithm are recordedwhile cooling down the penalty factor and ramping up the number of change-points. Dates that fall inthe change-point list in a significant number of runs of the algorithm are considered viable change-pointcandidates. We are reliably able to obtain only two change-points - one slightly before Christmas (23rd

- 25th December 2017) and another one at the beginning of the summer (around the first week of April).This backs our initial assertion that activity patterns of the English verified Twittersphere are mostlyresilient to socio-political events external to the network, especially since our collection period consistsof the months leading up to significant global events such as the 2018 FIFA World Cup. This aligns withprior work [77] that demonstrates other aspects of the Twitter network, such as topology being resilientto exigent circumstances, such as natural disasters.

4.5 Conclusion and Future Work

We studied 231,235 English speaking verified Twitter user-profiles and the 79,213,811 social con-nections between them. We characterized their network structure and analyzed user activity patterns ofthe data collected over a span of one year - July 2017 to June 2018. We observe strong evidence for thepresence of a power law in the out-degree and Laplacian eigenvalue distributions in the Twitter networkof English verified users. This marks a deviation from findings on the entire Twitter network. Otheraspects for which our network deviates from the generic Twittersphere are the lower average degree ofseparation, higher reciprocity, and a large number of attracting components. We also demonstrate howthe centrality of a user within this sub-graph is indicative of its influence and reach on Twitter. Wehave also found that the activity levels of English verified users are largely unaffected by current eventsextraneous to the network.

The above-mentioned deviations likely constitute a unique fingerprint for verified users which canbe leveraged to discern between a verified and a non-verified user. This can further help evaluate thestrength of an unverified user’s case for getting verified. These network signatures might also be lever-aged for realistic synthetic network generation in the future.

36

Chapter 5

Discerning Verified Users

Chapter Guide

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Inferring Verified Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3.1 Feature Importance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.3.2 Clustering and characterization . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4 Comparative Topical Usage Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.4.1 Content Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.4.2 Topical Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

37

5.1 Overview

The increased relevance of social media in our daily life has been accompanied by an exigent de-mand for a means to affirm the authenticity and authority of content sources. This challenge becomeseven more apparent during the dissemination of real-time or breaking news, whose arrival on such plat-forms often precedes eventual traditional media reportage [29, 61]. In line with this need, major socialnetworks such as Twitter, Facebook and Instagram have incorporated a verification process to authen-ticate handles they deem important enough to be worth impersonating. Usually conferred to accountsof well-known public personalities and businesses, verified accounts are indicated with a badge next tothe screen name (e.g., on Twitter and on Facebook). Twitter’s verification policy [112] states thatan account is verified if it belongs to a personality or business deemed to be of sufficient public interestin diverse fields, such as journalism, politics, sports, etc. However, the exact decision-making processbehind evaluating the strength of a user’s case for verification remains a trade secret. This work attemptsto unravel the likely factors that strengthen a user’s case for verification by delving into the aspects of auser’s Twitter presence, that most reliably predict platform verification.

5.1.1 Motivation

Our motivation behind this work was two-fold and is elaborated in the following text.Lack of procedural clarity and imputation of bias: Despite repeated statements by Twitter about

verification not being equivalent to an endorsement, aspects of the process – the rarity of the statusand its prominent visual signalling [114] – have led users to conflate authenticity and credibility. Thisperception was confirmed in full public view when Twitter was backed into suspending its requests forverification in response to being accused of granting verified status to political extremists 1, with theinsinuation being that the verified badge lent their otherwise extremist opinions a facade of mainstreamcredibility.

This, however, engendered accusations of Twitter’s verification procedure harbouring a liberal bias.Multiple tweets imputing the same gave rise to the hashtag #VerifiedHate. Similar insinuations havebeen made by right-leaning Indian users of the platform in the lead up to the 2019 Indian General Elec-tions under the hashtag #ProtestAgainstTwitter. These hitherto unfounded allegations of bias promptedus to delve deeper into understanding what may be driving the process and inferring whether theseclaims were justified or could the difference in status be explained away by less insidious factors relat-ing to a user’s profile and content.

Positive perception and coveted nature: Despite having its detractors, the fact remains that averified badge is highly coveted amongst public figures and influencers. This is with good reason as inspite of being intended as a mark of authenticity, prior work in social sciences and psychology points toverified badges conferring additional credibility to a handle’s posted tweets [13, 35, 83]. Psychologicaltesting [36] has also revealed that the credibility of a message and its reception is influenced by its

1https://www.bbc.com/news/technology-41934831

38

https://www.bbc.com/news/technology-41934831

purported source and presentation rather than just its pertinence or credulity. Captology studies [32]indicate that widely endorsed information originating from a well-known source is easier to perceive astrustworthy and back up the former claim. This is pertinent as owners of verified accounts are usuallyrenowned, and their content is on an average more frequently liked and retweeted than that of the genericTwittersphere [102, 107].

Adding to the desirability of exclusive visual indicators is the demanding nature of credibility as-sessment on Twitter. The imposed character limit and a minimal scope of visually customising content,coupled with the feverish rate at which content is consumed – with users on average devoting a merethree seconds of attention per tweet [25] – makes users resort to heuristics to judge online content. Thereis substantial work on heuristic-based models for online credibility evaluation [16, 42, 105]. Particu-larly relevant to this inquiry is the endorsement heuristic, which is associated with credibility conferredto it (e.g. a verified badge) and the consistency heuristic, which stems from endorsements by severalauthorities (e.g. a user verified in one platform is likely to be verified on others).

Unsurprisingly, a verified status is highly sought after by preeminent entities, as evidenced by theprevalence of get-verified-quick schemes such as promoted tweets from the now-suspended account‘@verified845’ [11, 111]. Our work attempts to obtain actionable insights into the verification process,thus providing entities looking to get verified a means to strengthen their case.

5.1.2 Research Questions

The aforementioned motivating factors pose a few avenues of research inquiry that we attempt toanswer, which are detailed below.

RQ1: Can the verification status of a user be predicted from profile metadata and tweet contents? If so,what are the most reliably discriminative features?

RQ2: Do any inconsistencies exist between verified and non-verified users with respect to peripheralaspects like the choice and variety of topics they choose to tweet?

5.1.3 Contributions

Our contributions can be summarised as follows:

• We motivate and propose the problem of predicting the verification status of a Twitter user.

• We detail a framework extracting a substantial set of features from data and meta-data aboutsocial media users, including friends, tweet content and sentiment, activity time series, and profiletrajectories. We have made this dataset of 407,165 users and 494 million tweets, publicly availableupon publication of the work.

• Additionally, we factored in state-of-the-art bot detection analysis into our predictive model. Weuse these features to train highly-accurate models capable of discerning a user’s verified status.

39


For a general user, we are able to provide a zero to one score representing their likelihood of beingverified on Twitter.

• We report the most informative features in discriminating verified users from non-verified onesand also shed light on the manner in which the span and gamut of topic coverage between theirtweets differs.

The rest of the chapter is organised as follows. In Section 5.2 we detail relevant prior work, henceputting our work in perspective. In Section 5.3 and Section 5.4, we conduct a comparative analysisbetween verified and non-verified users, addressing RQ1 and RQ2 respectively, and attempt to uncoverfeatures that can reliably classify them. We conclude with a brief summary in Section 5.5.

5.2 Related Work

Previous studies have focused on measuring user impact on social networks. As user impact mightbe a critical factor in deciding who gets verified on Twitter [112], it is important to study how certainusers in particular networks have more impact/influence as compared to the others. Cha et al. [15]studied the dynamics of influence on Twitter based on three key measures: in-degree, retweets, anduser-mentions. They show that in-degree alone is not sufficient to measure the influence of a user onTwitter. Bakshy et al. [5] demonstrate that URLs from users who have been influential in the past tendto generate larger cascades on the Twitter follower graph. They also show that URLs that are consideredmore interesting and that kindle positive emotions spread more. Canali et al. [12] identify key userson social networks who are important sources or targets for content disseminated online. They use adimensionality-reduction based technique and conduct experiments with YouTube and Flickr datasetsto obtain results which outperform the existing solutions by 15%. The novelty of their approach isthat they use feature-rich user-profiles and not just stay limited to their network information. On theother hand, Lampos et al. [63] predict user impact on Twitter using features, such as user statistics andtweet content, that are under the control of the user. They experiment with both linear and nonlinearprediction techniques and find that Gaussian Processes based models perform the best for the predictiontask. Klout [56] was a service that measured the influence of a person using information from multiplesocial networks. Their initial framework [93] used long-lasting (e.g., in-degree, PageRank centrality,recommendations etc.) and dynamic features (reactions to a post such as retweets, upvotes etc.) toestimate the influence of a person across nine different social networks.

Further studies have tried to classify users based on factors such as celebrity status, socioeconomicstatus etc. Lampos et al. [62] classify the socioeconomic status of users on Twitter as high, middle orlower socioeconomic, using features such as tweet content, topics of discussion, interaction behaviour,and user impact. They obtain an accuracy of 75% using a nonlinear, generative learning approachwith a composite Gaussian Process kernel. Preoctiuc-Pietro et al. [91] present a Gaussian Processregression model, which predicts the income of the user on Twitter. They examined factors that help

40

characterise user income on Twitter and analyse their relation with emotions, sentiments, perceivedpsycho-demographics, and language used in posts. Further, Marwick et al. [73] qualitatively study thebehaviours of celebrities on Twitter and how it impacts the creation and sharing of content online. Theyaim to conceptualise “celebrity as a practice” in terms of personal information revelation, languageusage, interactions, and affiliation with followers, among other things. There are also other studies thattry to characterise usage patterns [1] and personalities [106] of varied users on Twitter.

Multiple existing studies attempt to detect and analyze automated activity on Twitter [17, 18, 30, 38,118, 128] and differentiate bot activity from human or partial-human activity. Conversely, Chu et al. [18]identify users on Twitter that generate automated content. The verification badge was a key feature usedfor the purpose. Holistically characterising features that resemble automated activity, and the extent towhich exhibiting the same can hurt a user’s case for verification is further explored in Subsection 5.3.2.

Past studies on verified accounts have focused on elucidating their behaviours and properties onTwitter. Hentschel et al. [46] analyse verified users on Twitter and further use this information to identifytrustworthy “regular” (not fake or spam) Twitter users. Castillo et al. [13] attempt to identify credibletweets based on a variety of profile features, including whether the user was verified by the platform ornot. Along similar lines, Morris et al. [83] examined factors that influence profile credibility perceptionson Twitter. They found that possessing an authenticated status is one of the most robust predictorsof positive credibility. Paul et el. [88] performed multiple network analyses of the verified accountspresent on Twitter and revealed how they diverge from earlier results on the network as a whole. Hence,to summarise, there exists a rich body of literature establishing the enhancement of credibility andperceived importance with which a verified badge endows a user. However, no prior work, to the bestof our knowledge, has attempted to characterise attributes that make the aforementioned status moreattainable.

5.3 Inferring Verified Status

We commence our analysis by eliminating all features that could be deemed surfeit to requirements.To this end, we employed an all-relevant feature selection model [60], which classifies features intothree categories: confirmed, tentative and rejected. We only retain features that the model is able toconfirm over 100 iterations.

To evaluate the effectiveness of our framework in discerning verification status of users, we examinefive classification performance metrics – precision, recall, F1-score, accuracy and area under ROC curve– for five classifiers. The first two methods intended at establishing baselines were a Logistic Regressorand a Support Vector Classifier. Further, three methods were used to gauge how far the classificationperformance could be pushed using the features we collected. These were (1) a Generalised AdditiveModel trained by nested iterations, setting all terms to smooth, (2) a Multi-Layered Perceptron with 3hidden layers of 100, 30 and 10 neurons respectively, using Adam as an optimiser and ReLU as acti-vation and (3) state-of-the-art Gradient Boosting tool named XGBoost with a maximum tree depth of 6

41

Dataset Classifier Precision Recall F1-Score

Accuracy ROCAUCScore

Logistic Regression 0.86 0.86 0.86 0.859 0.854

Support Vector Clas-sifier

0.89 0.89 0.89 0.887 0.883

Originalimbalanced

data

Generalized AdditiveModel1

0.97 0.98 0.98 0.975 0.976

3-Hidden layerNN (100,30,10)ReLU+Adam

0.98 0.98 0.98 0.983 0.977

XGBoost Classifier 0.99 0.99 0.99 0.989 0.990



0.89 0.89 0.89 0.891 0.891

ADASYNclass

rebalancing


0.97 0.97 0.97 0.974 0.973


0.96 0.96 0.96 0.959 0.957




0.90 0.90 0.90 0.903 0.901

SMOTETomekclass

rebalancing


0.98 0.97 0.98 0.974 0.974


0.97 0.97 0.97 0.966 0.968


Table 5.1: Summary of classification performance of various approaches using metadata, temporaland contextual features on the original and balanced datasets.

1 The generalized additive models were trained using all smooth terms.

42

and a learning rate of 0.2. The results obtained are detailed in Table 5.1. The first batch of results are ob-tained by training on the original unadulterated training split. Even without rectifying class distributionbiases, we are able to attain a high classification accuracy of 98.9% on our most competitive classifier.

The second and third batches are trained on data rectified for class imbalance using the adaptivesynthetic over-sampling method (ADASYN) and a hybrid over and under-sampling method (SMOTE-Tomek), respectively. The ADASYN algorithm generates samples based on the feature space of theminority class data points and is a powerful method that has seen success across many domains [45]in neutralising the deleterious effects of class imbalance. The SMOTETomek algorithm combines theabove over-sampling strategy with an under-sampling approach called Tomek link removal [110] to re-move any bias introduced by over-sampling. This rectification did improve results, generally enhancingthe performance of our two baseline choices and especially helping us inch closer to perfect perfor-mance with gradient boosting. However, particularly surprising was the detrimental effect of classre-balancing on the MLP classifier, which in all likeliness also learned the non-salient patterns in there-balanced data. Also unexpectedly, the ADASYN re-balancing outperformed the more sophisticatedSMOTETomek re-balancing in pushing the performance limits of the support vector (89.1% accuracy)and gradient boosting (99.1% accuracy) approaches. This might be owing to the fact that the Tomeklink removal method omits informative samples close to the classification boundary, thus affecting thelearned support vectors and decision tree splits.

Our results suggest that near-perfect classification of the Twitter user verification status is possiblewithout resorting to complex deep-learning pipelines that sacrifice interpretability.

5.3.1 Feature Importance Analysis

To compare the usefulness of various categories of features, we trained gradient boosting classi-fier, our most competitive model, using each group of features alone. While we achieved the bestperformance with user metadata features, content features were not far behind. Evaluated on multiplerandomised train-test splits of our dataset, user metadata and content features were both able to consis-tently surpass 0.88 AUC. Additionally, temporal features alone are able to attain an AUC of over 0.79consistently.

The individual feature importances were determined using the Gini impurity reduction metric outputby the gradient boosting model trained on the unmodified dataset. In the interest of ranking the mostcritical features reliably, the model was trained 100 times with varying combinations of hyperparameters(column sub-sampling, data sub-sampling and tree child weight) and the features determined to be themost important were noted. The most reliably discriminative features and their normalised densitydistributions over the values they attain are detailed in Figure 5.1. These features generally exhibitintuitive patterns of separation based on which an informed prediction can be attempted, e.g., the veryhighest echelons of public list membership counts are populated exclusively by verified users while thevery low extremes of a propensity for authoritative speech as indicated by LIWC Clout summary scoresare exclusively displayed by non-verified users.

43

Figure 5.1: Normalized density estimations of the six most discriminative features for verified(blue) and non-verified users (red).

The top 6 features are sufficient to reach a performance of 0.9 AUC on their own right, and thetop 10 features are sufficient to further push those numbers up to 0.93. This is mainly owing to thefact that substantial redundancy was observed among sets of highly correlated features such as somelinguistic (tendency to use long words and impersonal pronouns highly correlate with high analyticLIWC summary scores) and temporal trajectory (most ascendant users score highly in both the 1 monthand 3 month features in terms of tweets authored and followers gained) features.

44

5.3.2 Clustering and characterization

In order to characterise accounts with a higher resolution than a binary verification status will per-mit, we apply K-Means++ on the normalised user vectors selecting the 30 most discriminative featuresindicated by the XGBoost model – our most competitive classifier. We settle on eight different clustersbased on evaluating the inflexion point of the clustering inertia curve, and the proportion of variance ex-plained. In the interest of an intuitive visualisation, two-dimensional embeddings obtained using t-SNEdimensionality reduction method [71] are presented. When the perplexity metric is tuned appropriately,the method considers the similarity of data points in our feature space and embeds them in a manner thatreflects their proximity in the feature space. The embeddings are plotted, and our classifier responsesfor members of the different clusters are detailed in Figure 5.2.

Figure 5.2: t-SNE embeddings of accounts coloured by cluster. The distribution of verificationprobabilities by cluster, as predicted by our classifier, are faceted on the right.

Investigating these clusters allows us to unravel further combinations of attributes that strengthen auser’s case for verification. Clusters C0 and C2 are composed nearly exclusively of non-verified users.Cluster C0 can primarily be characterised as the Twitter layman with a high proportion of experientialtweets. This narrative further plays out in our collected features with members of this cluster on averagehaving short tweets, high incidence of verb usage and scoring exceptionally high in the LIWC Authen-

45

ticity summary. Cluster C2 can be characterised as an amalgamation of accounts exhibiting bot-likebehaviour. Members of this cluster scored highly on the complete, network and content automationscores in our feature set. Furthermore, members in C2 possessed attributes previously linked to spam-mers such as copious usage of hashtags [126] and external links [119]. Manual inspection verified thesubstantial presence of automated content such as local weather updates in this cluster. Unsurprisingly,members of this cluster were predicted to possess the lowest verification probability by our classifier.

Cluster Population Accuracy ROC AUC Score

C0 19462 0.996 0.989

C1 26259 0.986 0.986

C2 19356 0.994 0.984

C3 46178 0.988 0.987

C4 90843 0.989 0.987

C5 105701 0.993 0.986

C6 39248 0.990 0.989

C7 60118 0.987 0.986

Table 5.2: Classification performance of our most com-petitive model broken down by cluster.

The composition of clusters C4 and C6 leans towards verified users, with members of C4 having atendency to post longer tweets and retweet more frequently than author content, while members of C6almost exclusively retweet on the platform with slightly over 93% of their content being such. ClusterC5 is nearly entirely comprised of verified users and includes elite Twitteratti that constitute the coreof verified users on the platform. These users have by far the highest list memberships on averagewhile also scoring very highly on the LIWC Clout summary. Predictably, members of this cluster werepredicted to possess the highest verification probability by our classifier.

The remaining clusters C1, C3 and C7 are comprised of a mix of verified and non-verified users.However, further inspection revealed that they have very divergent trajectories. Members of cluster C1are ascendant both in terms of reach and activity levels as evidenced by the proportion of their followersgained and statuses authored in the last one and three months of our collection period. These memberscan be said to constitute a nouveau-elite group of users. This is further backed up by the fact that theseusers lack in their presence in public lists as compared to the very established elite in cluster C5. Amanual inspection also verifies that many of these users have attained verification during our collectionperiod. This is in stark contrast with members of C3 and C7 who are either stagnant or declining in their

46

Classifier Precision Recall F1-Score

Accuracy ROCAUCScore

Generalized Additive Model (GAM)1 0.83 0.83 0.83 0.832 0.831

3-Hidden layer NN (100,30,10)ReLU+Adam

0.88 0.88 0.88 0.882 0.880


Table 5.3: Summary of classification performance of various approaches on inferred topics.1 The generalized additive models were trained using all smooth terms.

reach and activity levels and show shallow engagement with the rest of the platform in terms of retweetsand mentions. Remarkably, our classifier is able to make this distinction and rates members of C1 asslightly better candidates for verification on average than members of C3 or C7. The relative difficultyof classifying users in these mixed clusters is demonstrated in the performance breakdown detailed inTable 5.2.

5.4 Comparative Topical Usage Analysis

Having deduced important predictive features present in a user’s metadata, linguistic style and ac-tivity levels over time with respect to verification status, we next investigate the presence of similarpredictive patterns in the choice and variety of tweet topic usage amongst users.

5.4.1 Content Topics

In order to obtain a topical breakdown of a user’s tweets in an unsupervised manner, we ran the Gibbssampling based Mallet implementation of Latent Dirichlet Allocation (LDA) [74] with 1000 iterationsof sampling. Narrowing down on the correct number of topics T , required us to execute multiple runs ofthe model while varying our choices for the number of topics. The model was executed for 30, 50, 100,150 and 300 topics, and the likelihood estimates were noted. It must be mentioned that in all cases thelikelihood estimates stabilised well within the 1000 iteration limit we set. The likelihood keeps rising invalue up to T = 100 topics, after which it sees a decline. This kind of profile is often seen when varyingthe hyperparameter of a statistical model, with the optimal model being rich enough to fit the informationavailable in the data, yet not sufficiently complex to begin fitting noise. This led us to conclude that thetweets we collected over a year are best accounted for by incorporating 100 separate topics. We setdocument-topic density α = T/50 and topic-word density β = 0.01, which are the default settingsrecommended in prior studies [40] and maintain the sum of the Dirichlet hyperparameters, which canbe interpreted as the number of virtual samples contributing to the smoothing of the topic distribution,

47

Figure 5.3: Normalized density estimations of usage for the six most discriminative topics forverified (blue) and non-verified users (red). Listed alongside are the top three most probablekeywords for each topic.

as constant. The chosen value of β is small enough to permit a fine-grained breakdown of tweet topicscovering various conversational areas.

We again commenced the prediction by pruning down our topical feature set using the all relevantfeature selection method [60] we used earlier in Section 5.3. This allowed us to hone in on the 76topics that were confirmed to be predictive of verification status. To evaluate the effectiveness of ourframework in discerning verification status of users from topic cues, we examine five classificationperformance metrics – precision, recall, F1-score, accuracy and area under ROC curve – for the threeclassifiers that were most competitive in our previous classification task. These were (1) a Generalised

48

Additive Model trained by nested iterations, setting all terms to smooth, (2) a Multi-Layered Perceptronwith 3 hidden layers of 100, 30 and 10 neurons respectively, using Adam as an optimiser and ReLUas activation and (3) Gradient Boosting tool named XGBoost with a maximum tree depth of 5 and alearning rate of 0.3. The results obtained are detailed in Table 5.3. The results demonstrate that it iseminently possible to infer the verification status of a user purely using the distribution of topics theytweet about with high accuracy. The MLP classifier was the most competitive in this task, reliablypushing past 88.2% accuracy.

In the interest of interpretability, we evaluate the predictive power of each topic with respect to theclassification target. To this end, we obtain individual topic importances using the ANOVA F-Scoresoutput by GAM – our second most competitive model on this task. In order to rank the features reliably,the procedure is run on 50 random train-test splits of the dataset, and the topics with the lowest F-Scoresnoted. The most reliably discriminative topics and the normalised density distributions of their usage aredetailed in Figure 5.3. Some redundancy was observed in the way of multi-collinearity owing to multipletopics chiefly belonging to popular and broad conversational categories such as sports and politics. Thisis further backed up by the fact that the top 15 most discriminative topics alone can discern verificationstatus with an AUC of 0.76 while the top 25 topics can push those numbers up to an AUC of 0.8 nearlyapproximating the GAM performance on the whole feature set (AUC 0.83). These topics generallyexhibit intuitive patterns of separation based on which an informed prediction can be made, e.g., theusers who tweet most frequently about climate change are all verified while controversial topics likemiddle-east geopolitics are something towards which verified users prefer to devote limited attention.

5.4.2 Topical Span

Peripheral aspects of topics such as their geographical distribution [94] and the viability of embed-dings they induce for sentiment analysis [94] tasks have been explored before. This prompted us toextend our inquiry into peripheral measures such as inconsistencies in the variety and number of topicsabout which the two classes of users tweet. In order to obtain an optimal mix of the number of topicsper user in an unsupervised manner, we leveraged the use of a Hierarchical Dirichlet Process (HDP)model implementation [120] for topic inference. This method streams our corpus of tweets and per-forms an online Variational Bayes estimation to converge at an optimal number of topics T , for eachuser. Once again, we set α = T/50 and β = 0.01, which are the default settings recommended inexisting studies [40].

The distribution of cardinality for topic sets by verification status are detailed in Table 5.3. Inspectionof the distribution uncovers a clear trend with non-verified users clearly being over-represented in thelower reaches of the distribution (1–4 topics), while a comparatively substantial portion of verified usersis situated in the middle of the distribution (5–10 topics). Also noteworthy is the fact that the very upperechelons of topical variety in tweets are occupied solely by verified users. We posit that this may beowing to the fact that news handles (e.g., @BBC: 13 topics) and content aggregators (e.g., @GIFs: 21

49

https://twitter.com/BBC

https://twitter.com/gifs

topics) are over-represented in the set of verified users. The validation of this assertion is left for futurework.

Figure 5.4: Square-root scaled proportion of users by optimal number of topics.

5.5 Conclusion

The coveted nature of platform verification on Twitter has led to the proliferation of verificationscams and accusations of systemic bias against specific ideological demographics. Our work attempts touncover actionable intelligence on the inner workings of the verification system, effectively formulatinga checklist of profile attributes a user can work to improve upon to render verification more attainable.

This chapter presents a framework that computes the strength of a user’s case for verification of Twit-ter. We introduce our machine learning system that extracts a multitude of features per user, belongingto different classes: user metadata, tweet content, temporal signatures, expressed sentiment, automationprobabilities and preferred topics. We also categorise the users in our dataset into intuitive clusters anddetail the reasons behind their likely divergent outcomes from the verification procedure. Additionally,we demonstrate role, that a user’s choices and variety over conversational topics play in precluding oreffecting verification.

50

Our framework represents the first of its kind attempt at discerning and characterising verificationworthy users on Twitter and is able to attain a near-perfect classification performance of 99.1% AUC.We believe this framework will empower the average Twitter user to significantly enhance the qualityand reach of their online presence without resorting to prohibitively priced social media managementsolutions.

51

Chapter 6

Conclusions

Chapter Guide

6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

52

6.1 Overview

This thesis uncovers the various aspects of a users Twitter presence that have a bearing on theirverification status. Our work finds the Twitter verified status to be strongly indicative of influence in theTwitter network as a whole as well as being strongly predicted by factors such as public list membershipand clout expression in tweets, which have been previously shown to be useful predictors of otherstylized influence metrics. An interesting corollary to our findings regarding the strong correlationsobserved between a users centrality in the verified network and conventional metrics of platform reachis that the elite users seem to have the greatest tendency to make content viral and hence are also themost prized targets in spreading misinformation, as a significant proportion of information on socialgraphs flows through the shortest paths [55, 65, 84].

Another captivating and equally encouraging outcome of our analyses are the findings that the dif-ference in verification status can be explained away with organic aspects about how a user presentsthemselves online such as shying away from controversial topics, discoursing about a wide span of top-ics, shying away from expressing negative sentiment in tweets and other actionable aspects of a userspresence that they can help rather than more abstract and personal aspects of a users existence suchas religious or political beliefs. Users imputing any notion of bias or randomness towards the process[8, 57] would be well advised to attempt to fix the basics concerning their profile first to give themselvesthe best possible chance.

Finally, we cannot gloss over the obvious implications of our findings, which the thesis set out todemonstrate in the first place - that verification is a marker of more than just authenticity. That the badgeis seen as a marker of endorsement and authority comes as no surprise given our findings regarding howthe status seems to have been selectively extended to users exhibiting a predictable set of attributes thatusually signify a broad platform audience, the authority of speech and expertise in a wide variety oftopics.

6.2 Applications

The current and future applications of this work are outlined in the following text:

• Superior verification heuristic: Aforementioned deviations likely constitute a unique fingerprintfor verified users which can be leveraged gauge the strength of a users case for such status

• Alternate influence measure: Centrality and connectivity within the Twitter verified networkmay be utilized as a surrogate influence measure

• Realistic synthetic network/influential profile generation: Network studies could leveragereplicating the unravelled unique fingerprints so as to perform realistic simulations

53

• Actionable insights to improve online presence: Obtained insights can be used to enhance thequality and reach of ones online presence significantly before resorting to prohibitively pricedsocial media management solutions

6.3 Future Work

The logical future extensions of the research inquiry conducted in this thesis can be detailed asfollows:

• Regular timed snapshots of the evolution of the verified user network can be collected, and con-tagion based models can be utilized to model the spread of the status.

• Interactions such as likes and retweets of verified user content and back and forth on conversationtrees with those users can be modelled as events. Survival analysis can be used to model thetransition from an unverified status to a verified one, as a function of the aforementioned events.

• The embedding of verified users in the network of all Twitter users can be used to quantify theextent to which verified users accelerate information cascades on the platform. This can be usedto target branding efforts or in stymying misinformation campaigns.

6.4 Limitations

The limitations of this work are outlined in the following text:

• Due to limitations regarding sentiment analysis and topic modelling, we had to restrict our inquiryto purely English speaking users on the platform

• Due to time and resource constraints, we had to restrict our analysis to a one year period whichcoincided with several controversial online events as well as a phase of constant upheaval inTwitters verification policy

54

Related Publications

1. I. PAUL, KHATTAR, A., KUMARAGURU, P., GUPTA, M., AND CHOPRA, S. Elites Tweet?Characterizing the Twitter Verified User Network. ICDE Workshop on Large Scale GraphData Analytics (2019)

[Accepted]

2. I. PAUL, KHATTAR, A., CHOPRA, S., KUMARAGURU, P., AND GUPTA, M. What sets VerifiedUsers apart? Insights, Analysis and Prediction of Verified Users on Twitter. WebSci (2019)

[Accepted]

55

Bibliography

[1] AL MARUF, H., MESHKAT, N., ALI, M. E., AND MAHMUD, J. Human behaviour in differentsocial medias: A case study of twitter and disqus. In 2015 IEEE/ACM International Conferenceon Advances in Social Networks Analysis and Mining (ASONAM) (2015), IEEE, pp. 270–273. 41

[2] ANUTA, D., CHURCHIN, J., AND LUO, J. Election bias: Comparing polls and twitter in the2016 us election. arXiv preprint arXiv:1701.06232 (2017). 20

[3] APPLING, D. S., BRISCOE, E. J., AND HUTTO, C. J. Discriminative models for predictingdeception strategies. In Proceedings of the 24th International Conference on World Wide Web(2015), ACM, pp. 947–952. 20

[4] BAKHSHANDEH, R., SAMADI, M., AZIMIFAR, Z., AND SCHAEFFER, J. Degrees of separationin social networks. In Fourth Annual Symposium on Combinatorial Search (2011). 29

[5] BAKSHY, E., HOFMAN, J. M., MASON, W. A., AND WATTS, D. J. Everyone’s an influencer:quantifying influence on twitter. In Proceedings of the fourth ACM international conference onWeb search and data mining (2011), ACM, pp. 65–74. 6, 40

[6] BATISTA, G. E., PRATI, R. C., AND MONARD, M. C. A study of the behavior of severalmethods for balancing machine learning training data. ACM SIGKDD explorations newsletter 6,1 (2004), 20–29. 22

[7] BATOOL, R., KHATTAK, A. M., MAQBOOL, J., AND LEE, S. Precise tweet classificationand sentiment analysis. In 2013 IEEE/ACIS 12th International Conference on Computer andInformation Science (ICIS) (2013), IEEE, pp. 461–466. 20

[8] BISHOP, R. Verified is now a derogatory term on Twitter. https://theoutline.com/

post/1323/verified-blue-checkmark-derogatory-insult-twitter?zd=

1&zi=rlmvbsag, 2019. Accessed: 2019-07-15. 53

[9] BOLLEN, J., MAO, H., AND ZENG, X. Twitter mood predicts the stock market. Journal ofcomputational science 2, 1 (2011), 1–8. 20

[10] BONCHI, F. Influence propagation in social networks: A data mining perspective. IEEE Intelli-gent Informatics Bulletin 12, 1 (2011), 8–16. 6

56

https://theoutline.com/post/1323/verified-blue-checkmark-derogatory-insult-twitter?zd=1&zi=rlmvbsag



[11] BUSTLE. This Twitter Verification Scam Was Promoted By Twitter Itself, And The ConsequencesAre Terrifying. https://www.bustle.com/p/this-twitter-verification-

scam-was-promoted-by-twitter-itself-the-consequences-are-

terrifying-7833920, 2018. Accessed: 2018-12-27. 39

[12] CANALI, C., AND LANCELLOTTI, R. A quantitative methodology based on component analysisto identify key users in social networks. International Journal of Social Network Mining 1, 1(2012), 27–50. 40

[13] CASTILLO, C., MENDOZA, M., AND POBLETE, B. Information credibility on twitter. In Pro-ceedings of the 20th international conference on World wide web (2011), ACM, pp. 675–684. 26,38, 41

[14] CATALDI, M., DI CARO, L., AND SCHIFANELLA, C. Emerging topic detection on twitter basedon temporal and social terms evaluation. In Proceedings of the tenth international workshop onmultimedia data mining (2010), ACM, p. 4. 20

[15] CHA, M., HADDADI, H., BENEVENUTO, F., AND GUMMADI, K. P. Measuring user influencein twitter: The million follower fallacy. In fourth international AAAI conference on weblogs andsocial media (2010). 6, 40

[16] CHAIKEN, S. Heuristic versus systematic information processing and the use of source versusmessage cues in persuasion. Journal of personality and social psychology 39, 5 (1980), 752. 2,25, 39

[17] CHAVOSHI, N., HAMOONI, H., AND MUEEN, A. Identifying correlated bots in twitter. InInternational Conference on Social Informatics (2016), Springer, pp. 14–21. 41

[18] CHU, Z., GIANVECCHIO, S., WANG, H., AND JAJODIA, S. Detecting automation of twitteraccounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and SecureComputing 9, 6 (2012), 811–824. 26, 41

[19] CHUN, H., KWAK, H., EOM, Y.-H., AHN, Y.-Y., MOON, S., AND JEONG, H. Comparison ofonline social relations in volume vs interaction: a case study of cyworld. In Proceedings of the8th ACM SIGCOMM conference on Internet measurement (2008), ACM, pp. 57–70. 29

[20] CHUNG, F., LU, L., AND VU, V. Eigenvalues of random power law graphs. Annals of Combi-natorics 7, 1 (2003), 21–33. 27

[21] CLAUSET, A., SHALIZI, C. R., AND NEWMAN, M. E. Power-law distributions in empiricaldata. SIAM review 51, 4 (2009), 661–703. 12, 28

[22] COHN, M. A., MEHL, M. R., AND PENNEBAKER, J. W. Linguistic markers of psychologicalchange surrounding september 11, 2001. Psychological science 15, 10 (2004), 687–693. 16

57

https://www.bustle.com/p/this-twitter-verification-scam-was-promoted-by-twitter-itself-the-consequences-are-terrifying-7833920



[23] COLUMBIA, U. S. D. C. F. T. D. O. United states of america vs internet research agency.https://www.justice.gov/file/1035477/download. Accessed: 2019-07-14. 2

[24] COPPERSMITH, G., HARMAN, C., AND DREDZE, M. Measuring post traumatic stress disorderin twitter. In Eighth international AAAI conference on weblogs and social media (2014). 21

[25] COUNTS, S., AND FISHER, K. Taking it all in? visual attention in microblog consumption. InFifth International AAAI Conference on Weblogs and Social Media (2011). 25, 39

[26] CRAN. poweRlaw package. https://cran.r-project.org/web/packages/

poweRlaw/. Accessed: 2018-12-22. 29

[27] DE CHOUDHURY, M., GAMON, M., COUNTS, S., AND HORVITZ, E. Predicting depression viasocial media. In Seventh international AAAI conference on weblogs and social media (2013). 21

[28] DEPHILLIPS, K. How Much Does Social Media Marketing Cost? https:

//www.contentfac.com/how-much-does-social-media-marketing-cost/,2019. Accessed: 2019-07-12. 7

[29] DIAKOPOULOS, N., AND ZUBIAGA, A. Newsworthiness and network gatekeeping on twitter:The role of social deviance. In Eighth International AAAI Conference on Weblogs and SocialMedia (2014). 38

[30] DICKERSON, J. P., KAGAN, V., AND SUBRAHMANIAN, V. Using sentiment to detect botson twitter: Are humans more opinionated than bots? In Proceedings of the 2014 IEEE/ACMInternational Conference on Advances in Social Networks Analysis and Mining (2014), IEEEPress, pp. 620–627. 41

[31] EIKMEIER, N., AND GLEICH, D. F. Revisiting power-law distributions in spectra of real worldnetworks. In Proceedings of the 23rd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (2017), ACM, pp. 817–826. 27

[32] ERDOGAN, B. Z. Celebrity endorsement: A literature review. Journal of marketing management15, 4 (1999), 291–314. 25, 39

[33] FERRARA, E., VAROL, O., DAVIS, C., MENCZER, F., AND FLAMMINI, A. The rise of socialbots. Communications of the ACM 59, 7 (2016), 96–104. 3

[34] FERRARA, E., AND YANG, Z. Measuring emotional contagion in social media. PloS one 10, 11(2015), e0142390. 13, 20

[35] FLANAGIN, A. J., AND METZGER, M. J. The role of site features, user attributes, and informa-tion verification behaviors on the perceived credibility of web-based information. New media &society 9, 2 (2007), 319–342. 25, 38

58

https://www.justice.gov/file/1035477/download

https://cran.r-project.org/web/packages/poweRlaw/

https://cran.r-project.org/web/packages/poweRlaw/

https://www.contentfac.com/how-much-does-social-media-marketing-cost/

https://www.contentfac.com/how-much-does-social-media-marketing-cost/

[36] FOGG, B. J., SOOHOO, C., DANIELSON, D. R., MARABLE, L., STANFORD, J., AND TAUBER,E. R. How do users evaluate the credibility of web sites?: a study with over 2,500 participants. InProceedings of the 2003 conference on Designing for user experiences (2003), ACM, pp. 1–15.25, 38

[37] FULLER, W. A. Introduction to statistical time series, vol. 428. John Wiley & Sons, 2009. 10

[38] GILANI, Z., FARAHBAKHSH, R., TYSON, G., WANG, L., AND CROWCROFT, J. An in-depthcharacterisation of bots and humans on twitter. arXiv preprint arXiv:1704.01508 (2017). 41

[39] GONZALEZ-IBANEZ, R., MURESAN, S., AND WACHOLDER, N. Identifying sarcasm in twitter:a closer look. In Proceedings of the 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies: Short Papers-Volume 2 (2011), Association forComputational Linguistics, pp. 581–586. 21

[40] GRIFFITHS, T. L., AND STEYVERS, M. Finding scientific topics. Proceedings of the Nationalacademy of Sciences 101, suppl 1 (2004), 5228–5235. 47, 49

[41] GRINBERG, N., JOSEPH, K., FRIEDLAND, L., SWIRE-THOMPSON, B., AND LAZER, D. Fakenews on twitter during the 2016 us presidential election. Science 363, 6425 (2019), 374–378. 3

[42] GUPTA, A., KUMARAGURU, P., CASTILLO, C., AND MEIER, P. Tweetcred: Real-time credi-bility assessment of content on twitter. In International Conference on Social Informatics (2014),Springer, pp. 228–243. 2, 39

[43] HAMDAN, H., BELLOT, P., AND BECHET, F. lsislif: Feature extraction and label weightingfor sentiment analysis in twitter. In Proceedings of the 9th International Workshop on SemanticEvaluation (SemEval 2015) (2015), pp. 568–573. 22

[44] HE, H., BAI, Y., GARCIA, E. A., AND LI, S. Adasyn: Adaptive synthetic sampling approachfor imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEEWorld Congress on Computational Intelligence) (2008), IEEE, pp. 1322–1328. 14, 22

[45] HE, H., AND GARCIA, E. A. Learning from imbalanced data. IEEE Transactions on Knowledge& Data Engineering, 9 (2008), 1263–1284. 43

[46] HENTSCHEL, M., ALONSO, O., COUNTS, S., AND KANDYLAS, V. Finding users we trust:Scaling up verified twitter users using their communication patterns. In Eighth InternationalAAAI Conference on Weblogs and Social Media (2014). 26, 41

[47] HONG, L., AND DAVISON, B. D. Empirical study of topic modeling in twitter. In Proceedingsof the first workshop on social media analytics (2010), acm, pp. 80–88. 22

59

[48] HUTTO, C. J., AND GILBERT, E. Vader: A parsimonious rule-based model for sentiment anal-ysis of social media text. In Eighth international AAAI conference on weblogs and social media(2014). 20

[49] INFLUENCE, C. HOW BIG AN ISSUE ARE INFLUENTIAL UNVERIFIED TWIT-TER ACCOUNTS? https://influenceonline.co.uk/2017/09/21/big-issue-

influential-unverified-twitter-accounts/, 2018. Accessed: 2019-07-15. 5

[50] JAMIESON, K. H. Cyberwar: How Russian Hackers and Trolls Helped Elect a President WhatWe Don’t, Can’t, and Do Know. Oxford University Press, 2018. 2

[51] JAVA, A., SONG, X., FININ, T., AND TSENG, B. Why we twitter: understanding microbloggingusage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshopon Web mining and social network analysis (2007), ACM, pp. 56–65. 26

[52] KACEWICZ, E., PENNEBAKER, J. W., DAVIS, M., JEON, M., AND GRAESSER, A. C. Pronounuse reflects standings in social hierarchies. Journal of Language and Social Psychology 33, 2(2014), 125–143. 16

[53] KANG, C. In washington pizzeria attack, fake news brought real guns. https:

//www.nytimes.com/2016/12/05/business/media/comet-ping-pong-

pizza-shooting-fake-news-consequences.html. Accessed: 2019-07-14.2

[54] KILLICK, R., FEARNHEAD, P., AND ECKLEY, I. A. Optimal detection of changepoints witha linear computational cost. Journal of the American Statistical Association 107, 500 (2012),1590–1598. 11, 36

[55] KIMURA, M., AND SAITO, K. Tractable models for information diffusion in social networks.In European conference on principles of data mining and knowledge discovery (2006), Springer,pp. 259–271. 53

[56] KLOUT. Klout. https://www.lithium.com/products/klout, 2019. Accessed: 2019-02-16. 40

[57] KOMARAGIRI, A. Twitter Has an India Problem. https://medium.com/

humanrightscenter/twitter-has-an-india-problem-c91ba406724a, 2019.Accessed: 2019-07-12. 53

[58] KOWALCZYK, C. M., AND POUNDERS, K. R. Transforming celebrities through social media:the role of authenticity and emotional attachment. Journal of Product & Brand Management 25,4 (2016), 345–356. 6

60

https://influenceonline.co.uk/2017/09/21/big-issue-influential-unverified-twitter-accounts/

https://influenceonline.co.uk/2017/09/21/big-issue-influential-unverified-twitter-accounts/

https://www.nytimes.com/2016/12/05/business/media/comet-ping-pong-pizza-shooting-fake-news-consequences.html



https://www.lithium.com/products/klout

https://medium.com/humanrightscenter/twitter-has-an-india-problem-c91ba406724a

https://medium.com/humanrightscenter/twitter-has-an-india-problem-c91ba406724a

[59] KUBAT, M., MATWIN, S., ET AL. Addressing the curse of imbalanced training sets: one-sidedselection. In Icml (1997), vol. 97, Nashville, USA, pp. 179–186. 15, 22

[60] KURSA, M. B., RUDNICKI, W. R., ET AL. Feature selection with the boruta package. J StatSoftw 36, 11 (2010), 1–13. 41, 48

[61] KWAK, H., LEE, C., PARK, H., AND MOON, S. What is twitter, a social network or a newsmedia? In Proceedings of the 19th international conference on World wide web (2010), AcM,pp. 591–600. 26, 27, 29, 31, 38

[62] LAMPOS, V., ALETRAS, N., GEYTI, J. K., ZOU, B., AND COX, I. J. Inferring the socioeco-nomic status of social media users based on behaviour and language. In European Conference onInformation Retrieval (2016), Springer, pp. 689–695. 40

[63] LAMPOS, V., ALETRAS, N., PREOTIUC-PIETRO, D., AND COHN, T. Predicting and character-ising user impact on twitter. In Proceedings of the 14th Conference of the European Chapter ofthe Association for Computational Linguistics (2014), pp. 405–413. 40

[64] LEE, C., KWAK, H., PARK, H., AND MOON, S. Finding influentials based on the temporalorder of information adoption in twitter. In Proceedings of the 19th international conference onWorld wide web (2010), ACM, pp. 1137–1138. 20

[65] LERMAN, K., AND GHOSH, R. Information contagion: An empirical study of the spread ofnews on digg and twitter social networks. In Fourth International AAAI Conference on Weblogsand Social Media (2010). 13, 53

[66] LESKOVEC, J., AND HORVITZ, E. Planetary-scale views on a large instant-messaging network.In Proceedings of the 17th international conference on World Wide Web (2008), ACM, pp. 915–924. 29

[67] LESKOVEC, J., AND MCAULEY, J. J. Learning to discover social circles in ego networks. InAdvances in neural information processing systems (2012), pp. 539–547. 20

[68] LIN, Y., RAZA, A. A., LEE, J.-Y., KOUTRA, D., ROSENFELD, R., AND FALOUTSOS, C. In-fluence propagation: Patterns, model and a case study. In Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (2014), Springer, pp. 386–397. 6

[69] LIU, Q., XIANG, B., CHEN, E., XIONG, H., TANG, F., AND YU, J. X. Influence maximizationover large-scale social networks: A bounded linear approach. In Proceedings of the 23rd ACMInternational Conference on Conference on Information and Knowledge Management (2014),ACM, pp. 171–180. 6

[70] LJUNG, G. M., AND BOX, G. E. On a measure of lack of fit in time series models. Biometrika65, 2 (1978), 297–303. 10

61

[71] MAATEN, L. V. D., AND HINTON, G. Visualizing data using t-sne. Journal of machine learningresearch 9, Nov (2008), 2579–2605. 45

[72] MAHMUD, J., NICHOLS, J., AND DREWS, C. Where is this tweet from? inferring home lo-cations of twitter users. In Sixth International AAAI Conference on Weblogs and Social Media(2012). 20

[73] MARWICK, A., AND BOYD, D. To see and be seen: Celebrity practice on twitter. Convergence17, 2 (2011), 139–158. 41

[74] MCCALLUM, A. K. Mallet: A machine learning for language toolkit. 22, 47

[75] MCCOLLISTER, C., LUO, B., AND HUANG, S. Building topic models to predict author at-tributes from twitter messages. In CLEF (Working Notes) (2015). 22

[76] MCPHERSON, M., SMITH-LOVIN, L., AND COOK, J. M. Birds of a feather: Homophily insocial networks. Annual review of sociology 27, 1 (2001), 415–444. 27

[77] MENDOZA, M., POBLETE, B., AND CASTILLO, C. Twitter under crisis: Can we trust what wert? In Proceedings of the first workshop on social media analytics (2010), ACM, pp. 71–79. 36

[78] MIHAIL, M., AND PAPADIMITRIOU, C. On the eigenvalue power law. In International Work-shop on Randomization and Approximation Techniques in Computer Science (2002), Springer,pp. 254–262. 27

[79] MILGRAM, S. The small world problem. Psychology today 2, 1 (1967), 60–67. 29

[80] MISLOVE, A., LEHMANN, S., AHN, Y.-Y., ONNELA, J.-P., AND ROSENQUIST, J. N. Under-standing the demographics of twitter users. In Fifth international AAAI conference on weblogsand social media (2011). 18

[81] MIURA, Y., SAKAKI, S., HATTORI, K., AND OHKUMA, T. Teamx: A sentiment analyzer withenhanced lexicon mapping and weighting scheme for unbalanced data. In Proceedings of the 8thInternational Workshop on Semantic Evaluation (SemEval 2014) (2014), pp. 628–632. 22

[82] MOCANU, D., BARONCHELLI, A., PERRA, N., GONCALVES, B., ZHANG, Q., AND VESPIG-NANI, A. The twitter of babel: Mapping world languages through microblogging platforms. PloSone 8, 4 (2013), e61981. 18

[83] MORRIS, M. R., COUNTS, S., ROSEWAY, A., HOFF, A., AND SCHWARZ, J. Tweeting isbelieving?: understanding microblog credibility perceptions. In Proceedings of the ACM 2012conference on computer supported cooperative work (2012), ACM, pp. 441–450. 25, 26, 38, 41

62

[84] MYERS, S. A., SHARMA, A., GUPTA, P., AND LIN, J. Information network or social network?:the structure of the twitter follow graph. In Proceedings of the 23rd International Conference onWorld Wide Web (2014), ACM, pp. 493–498. 53

[85] NEPUSZ, T. Fitting power-law distributions to empirical data. https://github.com/

ntamas/plfit. Accessed: 2018-12-22. 28

[86] NEWMAN, M., BARABASI, A.-L., AND WATTS, D. J. The structure and dynamics of networks,vol. 12. Princeton University Press, 2011. 27

[87] NEWMAN, M. L., PENNEBAKER, J. W., BERRY, D. S., AND RICHARDS, J. M. Lying words:Predicting deception from linguistic styles. Personality and social psychology bulletin 29, 5(2003), 665–675. 16

[88] PAUL, I., KHATTAR, A., KUMARAGURU, P., GUPTA, M., AND CHOPRA, S. Elites tweet?characterizing the twitter verified user network. ICDE Workshop on Large Scale Graph DataAnalytics (2019). 41

[89] PENNEBAKER, J. W., BOYD, R. L., JORDAN, K., AND BLACKBURN, K. The development andpsychometric properties of liwc2015. Tech. rep., 2015. 15, 21

[90] PENNEBAKER, J. W., CHUNG, C. K., FRAZEE, J., LAVERGNE, G. M., AND BEAVER, D. I.When small words foretell academic success: The case of college admissions essays. PloS one9, 12 (2014), e115844. 16

[91] PREOTIUC-PIETRO, D., VOLKOVA, S., LAMPOS, V., BACHRACH, Y., AND ALETRAS, N.Studying user income through language, behaviour and affect in social media. PloS one 10, 9(2015), e0138717. 40

[92] RAMTEKE, J., SHAH, S., GODHIA, D., AND SHAIKH, A. Election result prediction using twit-ter sentiment analysis. In 2016 international conference on inventive computation technologies(ICICT) (2016), vol. 1, IEEE, pp. 1–5. 20

[93] RAO, A., SPASOJEVIC, N., LI, Z., AND DSOUZA, T. Klout score: Measuring influence acrossmultiple social networks. In 2015 IEEE International Conference on Big Data (Big Data) (2015),IEEE, pp. 2282–2289. 40

[94] REN, Y., ZHANG, Y., ZHANG, M., AND JI, D. Improving twitter sentiment classificationusing topic-enriched multi-prototype word embeddings. In Thirtieth AAAI conference on artificialintelligence (2016). 49

[95] SCHOENEBECK, G. Potential networks, contagious communities, and understanding social net-work structure. In Proceedings of the 22nd international conference on World Wide Web (2013),ACM, pp. 1123–1132. 27

63

https://github.com/ntamas/plfit

https://github.com/ntamas/plfit

[96] SEABOLD, S., AND PERKTOLD, J. Statsmodels: Econometric and statistical modeling withpython. In Proceedings of the 9th Python in Science Conference (2010), vol. 57, Scipy, p. 61. 35

[97] SEMERTZIDIS, K., PITOURA, E., AND TSAPARAS, P. How people describe themselves ontwitter. In Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks(2013), ACM, pp. 25–30. 26

[98] SHANE, S. The fake americans russia created to influence the election. https:

//www.nytimes.com/2017/09/07/us/politics/russia-facebook-twitter-

election.html. Accessed: 2019-07-14. 2

[99] SHANE, S. Russian 2016 influence operation targeted african-americans on socialmedia. https://www.nytimes.com/2018/12/17/us/politics/russia-2016-

influence-campaign.html. Accessed: 2019-07-14. 2

[100] SHAO, C., CIAMPAGLIA, G. L., VAROL, O., FLAMMINI, A., AND MENCZER, F. The spreadof fake news by social bots. arXiv preprint arXiv:1707.07592 (2017), 96–104. 3

[101] SHARMA, N. K., GHOSH, S., BENEVENUTO, F., GANGULY, N., AND GUMMADI, K. Inferringwho-is-who in the twitter social network. ACM SIGCOMM Computer Communication Review42, 4 (2012), 533–538. 34

[102] STATISTA. Most popular tweets on Twitter as of November 2018, by number ofretweets. https://www.statista.com/statistics/699462/twitter-most-

retweeted-posts-all-time/. Accessed: 2018-12-22. 25, 39

[103] STENQVIST, E., AND LONNO, J. Predicting bitcoin price fluctuation with twitter sentimentanalysis, 2017. 20

[104] STEYVERS, M., SMYTH, P., ROSEN-ZVI, M., AND GRIFFITHS, T. Probabilistic author-topicmodels for information discovery. In Proceedings of the tenth ACM SIGKDD international con-ference on Knowledge discovery and data mining (2004), ACM, pp. 306–315. 22

[105] SUNDAR, S. S. The main model: A heuristic approach to understanding technology effects oncredibility. Digital media, youth, and credibility 73100 (2008). 2, 25, 39

[106] TADESSE, M. M., LIN, H., XU, B., AND YANG, L. Personality predictions based on userbehavior on the facebook social media platform. IEEE Access 6 (2018), 61959–61969. 41

[107] TECHACUTE. Top 40 List of the Most-Liked Tweets on Twitter. https://techacute.com/list-most-liked-tweets/. Accessed: 2018-12-22. 25, 39

[108] I. PAUL, KHATTAR, A., CHOPRA, S., KUMARAGURU, P., AND GUPTA, M. What sets VerifiedUsers apart? Insights, Analysis and Prediction of Verified Users on Twitter. WebSci (2019).

64

https://www.nytimes.com/2017/09/07/us/politics/russia-facebook-twitter-election.html



https://www.nytimes.com/2018/12/17/us/politics/russia-2016-influence-campaign.html

https://www.nytimes.com/2018/12/17/us/politics/russia-2016-influence-campaign.html

https://www.statista.com/statistics/699462/twitter-most-retweeted-posts-all-time/

https://www.statista.com/statistics/699462/twitter-most-retweeted-posts-all-time/

https://techacute.com/list-most-liked-tweets/

https://techacute.com/list-most-liked-tweets/

[109] I. PAUL, KHATTAR, A., KUMARAGURU, P., GUPTA, M., AND CHOPRA, S. Elites Tweet?Characterizing the Twitter Verified User Network. ICDE Workshop on Large Scale GraphData Analytics (2019).

[110] TOMEK, I. Two modifications of cnn. IEEE Trans. Systems, Man and Cybernetics 6 (1976),769–772. 43

[111] TRIPWIRE. Get Verified Through a Promoted Tweet? Nope. Its a Scam! https:

//www.tripwire.com/state-of-security/latest-security-news/get-

verified-promoted-tweet-nope-scam/, 2019. Accessed: 2019-1-29. 39

[112] TWITTER. About Verified Accounts: Twitter Help 2018. https://help.twitter.com/

en/managing-your-account/about-twitter-verified-accounts. Accessed:2018-12-22. 18, 25, 38, 40

[113] TWITTER. Verified account FAQs. https://help.twitter.com/en/managing-your-account/twitter-verified-accounts, 2018. Accessed: 2018-12-22. 5, 18

[114] TWITTER. Twitter Support Statement. https://twitter.com/TwitterSupport/

status/930926124892168192, 2019. Accessed: 2019-1-22. 38

[115] VAROL, O., FERRARA, E., DAVIS, C. A., MENCZER, F., AND FLAMMINI, A. Online human-bot interactions: Detection, estimation, and characterization. In Eleventh international AAAIconference on web and social media (2017). 22

[116] VOLKOVA, S., SHAFFER, K., JANG, J. Y., AND HODAS, N. Separating facts from fiction:Linguistic models to classify suspicious and trusted news posts on twitter. In Proceedings of the55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)(2017), pp. 647–653. 21

[117] VUONG, Q. H. Likelihood ratio tests for model selection and non-nested hypotheses. Econo-metrica: Journal of the Econometric Society (1989), 307–333. 12

[118] WANG, A. H. Detecting spam bots in online social networking sites: a machine learning ap-proach. In IFIP Annual Conference on Data and Applications Security and Privacy (2010),Springer, pp. 335–342. 41

[119] WANG, A. H. Don’t follow me: Spam detection in twitter. In 2010 international conference onsecurity and cryptography (SECRYPT) (2010), IEEE, pp. 1–10. 46

[120] WANG, C., PAISLEY, J., AND BLEI, D. Online variational inference for the hierarchical dirichletprocess. In Proceedings of the Fourteenth International Conference on Artificial Intelligence andStatistics (2011), pp. 752–760. 49

65

https://www.tripwire.com/state-of-security/latest-security-news/get-verified-promoted-tweet-nope-scam/



https://help.twitter.com/en/managing-your-account/about-twitter-verified-accounts

https://help.twitter.com/en/managing-your-account/about-twitter-verified-accounts

https://help.twitter.com/en/managing-your-account/twitter-verified-accounts

https://help.twitter.com/en/managing-your-account/twitter-verified-accounts

https://twitter.com/TwitterSupport/status/930926124892168192

https://twitter.com/TwitterSupport/status/930926124892168192

[121] WARRINER, A. B., KUPERMAN, V., AND BRYSBAERT, M. Norms of valence, arousal, anddominance for 13,915 english lemmas. Behavior research methods 45, 4 (2013), 1191–1207. 20

[122] WATTS, D. J., AND STROGATZ, S. H. Collective dynamics of small-worldnetworks. nature393, 6684 (1998), 440. 29

[123] WENG, J., LIM, E.-P., JIANG, J., AND HE, Q. Twitterrank: finding topic-sensitive influentialtwitterers. In Proceedings of the third ACM international conference on Web search and datamining (2010), ACM, pp. 261–270. 22, 26

[124] WU, L., AND LIU, H. Tracing fake-news footprints: Characterizing social media messagesby how they propagate. In Proceedings of the Eleventh ACM International Conference on WebSearch and Data Mining (2018), ACM, pp. 637–645. 3

[125] WU, S., HOFMAN, J. M., MASON, W. A., AND WATTS, D. J. Who says what to whom ontwitter. In Proceedings of the 20th international conference on World wide web (2011), ACM,pp. 705–714. 18

[126] YARDI, S., ROMERO, D., SCHOENEBECK, G., ET AL. Detecting spam in a twitter network.First Monday 15, 1 (2010). 46

[127] ZANNETTOU, S., BRADLYN, B., DE CRISTOFARO, E., KWAK, H., SIRIVIANOS, M.,STRINGINI, G., AND BLACKBURN, J. What is gab: A bastion of free speech or an alt-rightecho chamber. In Companion of the The Web Conference 2018 on The Web Conference 2018(2018), International World Wide Web Conferences Steering Committee, pp. 1007–1014. 35

[128] ZHANG, C. M., AND PAXSON, V. Detecting and analyzing automated activity on twitter. InInternational Conference on Passive and Active Network Measurement (2011), Springer, pp. 102–111. 41

[129] ZUBIAGA, A., SPINA, D., FRESNO, V., AND MARTINEZ, R. Classifying trending topics: atypology of conversation triggers on twitter. In Proceedings of the 20th ACM international con-ference on Information and knowledge management (2011), ACM, pp. 2461–2464. 22

66

Thank You

insights, analysis and prediction of veriﬁed users on...

Documents