limited individual attention and online virality of low-quality … · social media recommendation,...

1

LETTERSPUBLISHED: 26 JUNE 2017 | VOLUME: 1 | ARTICLE NUMBER: 0132

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR 1, 0132 (2017) | DOI: 10.1038/s41562-017-0132 | www.nature.com/nhumbehav

Limited individual attention and online virality of low-quality informationXiaoyan Qiu1, 2 †, Diego F. M. Oliveira2* †, Alireza Sahami Shirazi3, Alessandro Flammini2, 4 and Filippo Menczer2, 3, 4

Social media are massive marketplaces where ideas and news compete for our attention1. Previous studies have shown that quality is not a necessary condition for online virality2 and that knowledge about peer choices can distort the relationship between quality and popularity3. However, these results do not explain the viral spread of low-quality information, such as the digital misinformation that threatens our democracy4. We investigate quality discrimination in a stylized model of an online social network, where individual agents prefer qual-ity information, but have behavioural limitations in manag-ing a heavy flow of information. We measure the relationship between the quality of an idea and its likelihood of becoming prevalent at the system level. We find that both information overload and limited attention contribute to a degradation of the market’s discriminative power. A good tradeoff between discriminative power and diversity of information is possible according to the model. However, calibration with empirical data characterizing information load and finite attention in real social media reveals a weak correlation between quality and popularity of information. In these realistic conditions, the model predicts that low-quality information is just as likely to go viral, providing an interpretation for the high vol-ume of misinformation we observe online.

Four centuries ago, the English poet John Milton argued that in a free and open encounter of ideas, truth prevails5. Since then the concept of a free marketplace of ideas has been used to support free speech policies and even applied to the study of scientific research6. The theory draws analogies to natural selection, where the traits of a species determine its survival, and to economic markets, where the intrinsic value of a good determines its success. Two necessary elements of this theory are the diversity of ideas to which people are exposed and the discriminative power of the marketplace, which we define as its ability to allow better ideas to become more popular.

The recent advent of social media as a major communication platform is having a significant impact on the marketplace by broad-ening participation and facilitating the contribution and exchange of information and opinions. We use the terms ‘idea’ and ‘meme’ interchangeably to mean a transmissible piece of information7. A meme can represent a link to a news article, a phrase, a hashtag, or a video or image. Through networks such as Twitter and Facebook, users are exposed daily to a large number of memes that compete to attain success. However, cognitive constraints limit the number of social interactions we can sustain8 and the number of ideas we can consider, giving rise to an ‘attention economy’1,9–11. The information

flows that result from such complex dynamics have increasingly consequential implications for politics and policy12–14, making the questions of discrimination and diversity more important in today’s online information markets than ever before.

Here, we study discriminative power and diversity in a model information network, similar to modern social media, where memes are shared and spread from person to person. We assume the exis-tence of an intrinsic measure of quality for each meme shared online, and explore how two critical factors—the number of com-peting memes and the finite attention of the participants—affect the system’s ability to select the best memes for survival and diffusion, while sustaining a diverse ecosystem of ideas. We observe a tradeoff between quality discrimination and diversity, in which both can be relatively high when agents have sufficient attention and are not over-loaded with information. Calibrating the model to empirical data is difficult because qualities such as value, innovation, reliability and relevance of information can rarely be measured or even defined a priori, making it difficult to quantify the discriminative power of online information markets. However, it is possible to characterize the distributions of information load and attention from empirical social media data. Unfortunately, these measurements place online information networks in a regime where quality and popularity of information are weakly correlated, far from the optimal tradeoff.

Our work draws inspiration from previous work, both in econom-ics and social science, that has studied the relation between ‘quality’ and popularity from both theoretical and empirical perspectives. Adler15 has shown that simple rationality arguments based on the cost of learning about quality will lead to ‘stars’ with disproportion-ate popularity even in the absence of differences in quality. Here, we assume that the cost of learning is zero; every agent can evaluate qual-ity, although there is noise in the choice. Weng et al.2 demonstrated that some memes inevitably achieve viral popularity irrespective of quality in the presence of competition among networked agents with limited attention. Their model did not incorporate an individual preference for quality memes. In their seminal ‘music lab’ experi-ment3, Salganik et al. showed how the relation between quality and popularity can be distorted by introducing mechanisms that allow ‘consumers’ to choose knowing about the aggregated choices of their peers. We instead assume a networked system where new items to choose from are continuously introduced and where agents only have access to local information shared by their neighbours. We focus on the capacity of the system to let quality emerge, how this discrimina-tion is affected by the cognitive limitations of individual agents and how it affects the level of diversity the system can support.

1School of Economics and Management, Shanghai Institute of Technology, 100 Haiquan Road, Fengxian District, Shanghai 201418, China. 2Center for Complex Networks and Systems Research, School of Informatics and Computing, Indiana University, 919 East 10th Street, Bloomington, Indiana 47408, USA. 3Yahoo Research, 701 1st Avenue, Sunnyvale, California 94089, USA. 4Indiana University Network Science Institute, 1001 IN-45, Bloomington, Indiana 47408, USA. †These authors contributed equally to this work. *e-mail: [email protected]

http://dx.doi.org/10.1038/s41562-017-0132

http://www.nature.com/nhumbehav

mailto:[email protected]

2

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.


LETTERS NATURE HUMAN BEHAVIOUR

The simple model presented here does not explicitly incorporate many behavioural, social and technological mechanisms that affect discrimination and diversity in the online marketplace of ideas. For example, previous research has studied the role of technology in discriminating between truthful information and misinformation. It has been argued that truth is easier to distribute because one can more easily verify truthful sources, making online misinformation expensive to sustain16. However, the ease of disseminating misin-formation through social media may counter this argument. The ‘wisdom of the crowd’ enabled by social media17 should also facil-itate the discrimination of information on the basis of quality by combining the diverse opinions of many individuals18. But when people communicate, their opinions are no longer independent, leading to higher confidence and lower accuracy19.

Cognitive and behavioural processes for dealing with opinions that challenge one’s beliefs may decrease our capability to discrimi-nate between high- and low-quality information20,21. For example, confirmation bias22 may have evolved as an effective strategy to avoid misinformation, by comparing incoming information with one’s own existing beliefs, and adopting it if it is sufficiently concor-dant23. However, in social media, such a bias easily leads to ineffec-tive discrimination; strategies such as accepting new information if it comes from multiple sources24 are not useful because people lack knowledge of the social network structure necessary to determine whether multiple information sources are independent of each other. Confirmation bias may be reinforced online by our limited capacity to cope with the information overload caused by the mes-sages that flood our screens25 and our consequent need to quickly discard irrelevant information.

The diversity of information present in the market can also be affected by the interplay between behavioural and cognitive factors and algorithmic biases of online social networks. It is easy to rewire our connections and affect the sources of information to which we are exposed26. Mechanisms such as triadic closure, facilitated by social media recommendation, may be suboptimal for the discov-ery of relevant but unfamiliar information27,28. These selection pro-cesses may cluster people into a few homogeneous factions29, often called ‘echo chambers’30 or ‘filter bubbles’31. This may further lead to polarization32–35; one group may automatically discount ideas from another36,37.

This body of work suggests that, paradoxically, our behavioural mechanisms to cope with information overload may make online information markets less meritocratic and diverse, increasing the spread of misinformation38,39 and making us vulnerable to manipu-lation40,41. Anecdotal evidence of hoaxes, conspiracy theories and fake news in online social media is so abundant that massive digital misinformation has been ranked among the top global risks for our society4, and fake news has become a major topic of debate in the United States and Europe.

Several studies have investigated the role played by network mechanisms affecting the popularity of individual memes. Crane and Sornette42 proposed an epidemic model on a social network to describe the exogenous and endogenous bursts of attention towards a video. Ratkiewicz et al.43 employed a model in which random col-lective shifts of attention due to exogenous events provide a way to interpret the broad distribution of magnitude in popularity bursts. Bingol44 proposed a dynamic model where agents can remember and forget, and use recommendation to discover new agents. This model predicts that the popularity of an agent is linearly related to memory size. Huberman45 studied the effects of the content’s nov-elty and popularity in attracting attention. Wu and Huberman46 developed a model with the novelty of a news story fading with time and showed that attention decays over a natural timescale. Lerman and colleagues47,48 showed that the combination of competition and position bias (a manifestation of limited attention in social media) affects the visibility of a meme and thus constrains social contagion.

The above literature considers the popularity of pieces of infor-mation in isolation. Markets in which many memes compete for the limited attention of social media users have received scarce consideration. A notable exception is the work of Weng et al.2, who used an agent-based model to demonstrate that the combination of a social network structure and the finite attention of social media users are sufficient conditions for the emergence of viral memes. Gleeson et al.49,50 formalized this model as a critical branch-ing process, predicting that the popularity of memes follows a power-law distribution with very heavy tails.

These results reveal that quality is not a necessary ingredient for explaining popularity patterns in online social networks, but they say nothing about the actual importance of information qual-ity. It is reasonable to assume that quality does play a role in individ-ual decisions about information consumption. This motivates our

100 101 102 103 104 105 106 107 108

Popularity

10–15

10–12

10–9

10–6

10–3

100

P(po

pula

rity)

Empiricalµ = 0.05µ = 0.1µ = 0.2µ = 0.4µ = 0.6µ = 0.8

m1m2

1–μμ

α

m3

m7

m9

m9

m9

m9m5 m5

m7 m7

m3

m1

m8m5

m1

m3

m6

m6

m6

m6

m6

m4

a

b

Figure 1 | Illustration of the meme diffusion model and predicted popularity distributions. a, At each time step, an agent is considered (blue node). The agent chooses to create and share a new meme (m9) with probability μ. Otherwise, with probability 1!− !μ, the agent reshares one of α messages in its feed, to which it is currently paying attention (m6). The message is transmitted to the agent’s neighbours and appears at the top of their feeds. b, Distribution (probability density function) of meme popularity p. We compared the model predictions (α!= !10) with an empirical distribution obtained by counting the number of occurrences of hashtags in a sample of public tweets. This empirical popularity was distributed according to a power-law distribution P(p)!≈ !p−1.94 (see black line as a guide for the eye).

http://dx.doi.org/10.1038/s41562-017-0132


3



LETTERSNATURE HUMAN BEHAVIOUR

theoretical analysis to determine whether discrimination of infor-mation according to its quality at the individual level can be reflected in discriminative power at the system level, and at what cost in terms of the market’s capacity to sustain diversity of information.

We aim to examine the conditions in which the ‘best’ ideas are those that capture a greater portion of collective attention, and whether this happens at the expense of the diversity of ideas. To this end, we propose a simple agent-based model inspired by the long tradition of representing the spread of ideas as an epidemic process where messages are passed along the edges of a network51–55. Agents are represented by the nodes of a static network where the links embody social connections. The network dynamics in the model capture the salient ingredients common to popular social media platforms. Each message, or post, carries a ‘meme’ or ‘idea’, that is, the unit of information that spreads from person to person7. Different messages may carry the same meme. Fig. 1a illustrates the dynamics of the model.

To examine the discriminative power of the market, we imagine that each meme is characterized by an intrinsic quality value. Agents pay attention to memes shared by their neighbours. We assume that the probability that an agent shares one of these memes, allowing it to spread, is proportional to the meme’s quality. The quality might represent different properties that make the meme more likely to be shared, depending on the situation being modelled: the originality of an idea, the beauty of a picture, and the truthfulness of a claim are valid examples.

In contrast to classical epidemiological models, messages carrying new memes are continuously introduced into the system in an exog-enous fashion. We use the rate μ at which this happens as a parameter of the model to regulate the information load of the agents, that is, the average number of memes received by an agent per unit time.

Agents produce messages containing new memes and reshare messages originated or forwarded by their neighbours. When reshar-ing, an agent is capable of paying attention to only a finite number α of messages at a time. If we think of messages from neighbours as appearing in, say, reverse chronological order on a social media feed, a user during a session will scroll down the feed to view α recent posts. Further details about the model are presented in the Methods.

Let us investigate how the information load affects the rela-tionship between meme quality and success. The relationship is not trivial because memes are not shared on the sole basis of their quality; a meme with lower quality may be selected if it is over-represented in an agent’s feed. The actual probability that a meme is shared depends on a complex interplay of factors that include its current popularity, the network structure and the limited attention of the agents. There are multiple ways to define the success of a meme, for instance by its longevity. Here we measure its popularity, defined as the number of times the meme is shared across the net-work from the moment it is first injected until it finally disappears.

The distribution of meme popularity, shown in Fig. 1b, depends on μ. For high μ, the distribution is exponentially narrow and no memes go viral. As the information load becomes lighter (μ < 0.2), our model reproduces the broad distribution from the empirical data, indicating that a few memes spread virally through the popu-lation. In the absence of quality, we would expect a power-law dis-tribution of popularity P(p) ≈ p−β with exponent β = 1.5 (refs 49,50). However, fitting56 reveals a larger exponent β ≈ 1.94. This is con-sistent with a model of a branching process with uniform fitness, which predicts an exponent β = 2 (ref. 57).

The relationship between meme quality and popularity is illus-trated in Fig. 2a. On average, memes with higher quality do have a better chance of survival and success, but in a way that depends considerably on μ. A single meme survives in the limit μ = 0, typi-cally but not necessarily one with high quality. For small values of (0 < μ ≪ 1), very high quality yields a disproportionally large chance of success. For large μ (0 ≪ μ <1), the relative advantage conferred by higher quality is much smaller. In the limit of μ = 1, a new meme is introduced at all times and therefore there is no chance for memes to spread, irrespective of quality. In summary, an increase in infor-mation load corresponds to a decrease in discriminative power because quality has a lesser effect on popularity.

The effect of finite attention on the relationship between quality and popularity is illustrated in Fig. 2b. We assume α > 1 so that agents have some choice in selecting which memes to share. As expected, the mean popularity grows with quality: the best memes have much higher chances of winning. As the amount of individual attention α increases, the curves become more concave; the mean popularity grows more slowly except for the highest values of quality, an indication of increased selective pressure favouring the best memes.

We can summarize the dependency between the quality of memes and their success in a single discriminative power mea-sure by looking at the correlation between quality and popularity. Since the two quantities are not normally distributed, we employ the Kendall rank correlation coefficient τ, which is computed by ranking memes according to the two criteria and then counting the number of meme pairs for which the two rankings are concordant or discordant, properly accounting for ties58. High τ indicates that fitter memes are more likely to win, granting the system discrimina-tive power; in the extreme case where τ = 1, the two rankings are completely concordant. Small τ signifies a lack of quality discrimi-nation by the network. Figure 3a shows that network discriminative power degrades both with higher information load and with more limited attention. Similar results are obtained when using mutual information in place of Kendall’s τ to measure discriminative power.

As discussed above, discriminative power in spreading quality content is a desirable property of a social network. A second desirable property of an ideal communication system is the preservation of information diversity, that is, the possibility of having many distinct

0.0 0.2 0.4 0.6 0.8 1.0Quality

1

4

16

64

256

Mea

n po

pula

rity

α = 2α = 4α = 8α = 50

0.0 0.2 0.4 0.6 0.8 1.0Quality

1

2

4

8

16

32

64

128

Mea

n po

pula

rity

µ = 0.1µ = 0.2µ = 0.4µ = 0.6µ = 0.8µ = 1.0

a b

Figure 2 | Average popularity of memes. a,b, The number of times a meme is shared (popularity) is plotted as a function of quality for different values of information load μ (α!= !10) (a) and attention α (μ!= !0.1) (b).

http://dx.doi.org/10.1038/s41562-017-0132


4




memes alive simultaneously. As illustrated in Fig. 3c, the two goals are in contradiction—the price associated with the capability of the network to let a high-quality meme prevail is a loss in diversity, with many memes receiving relatively little attention despite their intrin-sic quality. Let us therefore explore the tradeoff that results from the competition for attention in the network.

To measure the amount of diversity in the system at the steady state, we start from the entropy H = − ∑ mP(m)logP(m) where P(m) is the portion of attention received by meme m, that is, the fraction of messages with m across all of the user feeds. The sum runs over all memes present at a given time and is averaged over a long period after stationarity has been achieved (see Methods). The minimum entropy is zero, when all nodes have the same meme (μ = 0). The maximum entropy, obtained in the extreme case μ = 1, depends on α. To discount this dependence, we measure diversity using the nor-malized entropy H/H(μ = 1). Figure 3b shows that with this normal-ization, the diversity does not depend in a significant way on the attention α. As expected, the diversity increases with information load and is maximized for high μ.

The tradeoffs between discriminative power and diversity are better illustrated in Fig. 4a. For any value of finite attention α > 1, we observe a transition from relatively high discriminative power and low diversity (when information load is low) to high diversity and low discriminative power (high information load). The amount of attention α has a significant effect on the tradeoff: for a given level of diversity, the discriminative power improves when people can pay attention to multiple memes, and vice versa, the network can sustain a larger diversity without loss in discriminative power. When α is large, there is a region where the network can sustain very high diversity with relatively little loss of discriminative power.

The model has two key parameters, the information load μ and the individual attention α. We have made two simplifying assumption about these parameters: agents in our model introduce new memes

at the same rate μ and they have the same amount of attention α. In the real world, some people may post more new memes, others may tend to reshare memes adopted through their connections; some may pay attention to only a handful of messages, others may scroll through social media feeds for prolonged periods. Let us turn to empirical data to calibrate these ingredients of the model.

We counted posts and reposts by a large sample of social media users to estimate the portions of original and reshared memes per user (see Methods). Figure 4b shows the resulting empirical distri-bution of the rate of introduction of new memes, which corresponds to the parameter μ in our model. The information load spans the entire spectrum (μ ∈ [0,1]) but is skewed towards high values with a peak at μ = 1, corresponding to users who post but do not reshare. The average is quite high (⟨ μ⟩ ≈ 0.75).

The amount of attention one devotes to assessing information, ideas and opinions encountered in online social media depends not only on the individual but also on the their circumstances at the time of assessment; the same user may be hurried one time and careful another. We counted the number of times that a user stops on a post during a scrolling session on a social blogging platform to estimate their finite attention (see Methods). This number cor-responds to the parameter α in our model. Figure 4c shows that the resulting empirical distribution of α is broad, with average ⟨ α⟩ ≈ 14.

A naïve way to take these data into account for calibrating our model market of ideas is to set the information load and attention parameters to their empirical averages, ⟨ μ⟩ ≈ 0.75 and ⟨ α⟩ ≈ 14, respectively. This yields the value of discriminative power τ shown in Fig. 4d, which is roughly 70% of the maximum τ obtained in the limit of high α and low μ. We can easily remove the simplify-ing assumption about constant μ and α using insight from the data. We therefore adopted a second calibration of the model by draw-ing both μ and α from the empirical distributions of information load and attention. We repeated the analysis with this more realistic

0.0 0.2 0.4 0.6 0.8 1.0

2

4

6

8

10a

c

b

Information load, μ Information load, μ

Atte

ntio

n, α

2

4

6

8

10

Atte

ntio

n, α

0.00

0.05

0.10

0.15

0.20

0.25

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

µ = 0.2 µ = 0.9µ = 0.01

Figure 3 | Discriminative power and diversity. a, Discriminative power τ (colour scale bar) as a function of information load and finite attention. b, Diversity H/H(μ!= !1) (colour scale bar) as a function of intensity of information load and attention. c, Illustration of the tradeoff between the discriminative power of the system in spreading quality memes and diversity of content in the network. Nodes represent agents, their colour represents the last shared meme and their size indicates the quality of that meme (the bigger the node, the higher the quality). Edges represent social connections among agents, such as followers on Twitter or friends on Facebook. When the information load μ is small, only high-quality memes are present, with low diversity. As μ increases we observe higher diversity and lower discriminative power. Here N!= !128 and α!= !10.

http://dx.doi.org/10.1038/s41562-017-0132


5




model and found a significantly lower value of discriminative power, τ ≈ 0.15 (Fig. 4d). This finding suggests that the heterogeneous infor-mation load and attention of the real world lead to a market that is incapable of discriminating information on the basis of quality.

Given the importance of the heterogeneity of information load and attention across users, we investigated the possibility of repro-ducing their empirical distributions as an outcome of information market mechanisms. We extended the model by taking into account how users scroll through their social media feeds. The scrolling model, illustrated in the Methods, reproduces the empirical distri-butions of μ and α as well as the decrease in discriminative power.

The low τ predicted by our model means that the system is unable to discriminate between low-quality and high-quality infor-mation. Figure 5 illustrates this finding by plotting the distribu-tions of popularity for two groups of low-quality and high-quality memes, respectively. We observe that high-quality memes have no competitive advantage in terms of their chances of success.

Empirical validation of this model prediction requires the iden-tification of some feature that translates into a quality metric for information that is shared online. While this is generally difficult, proxy measures of quality exist in some cases. We used empiri-cal data from Emergent (http://www.emergent.info; see Methods) about posts shared on social media with links to news articles in two groups. Articles in one group support claims debunked by fact-checkers or undermine claims verified by fact-checkers. The other group includes articles that fact-check hoaxes or support verified claims. It is reasonable to assume that most people would consider articles in the second group as having higher quality than those in the first. We compared the numbers of times articles in these two groups were shared online. As illustrated in the inset of Fig. 5, the articles in the two groups are just as likely to go viral, as predicted by our model.

The proposed model is quite minimal and relies on few param-eters, but it captures salient behavioural features that shape the

diffusion of information in online social networks. This allows us to study how information load and limited attention affect the dis-criminative power of the network, that is, the likelihood that the best memes will succeed at reaching many people. Our main find-ing is that the survival of the fittest is far from a foregone conclusion where information is concerned. Both information load and limited attention lead to low discriminative power, so that it becomes very difficult for the best memes to win. Meme diversity can coexist with network discriminative power when we have plenty of attention and are not overloaded with information.

One important question that deserves further exploration is the role played by the network structure in determining market dis-criminative power and diversity. While the results presented here are robust to changes in network size, density and clustering (see Methods), the model could be further expanded to capture other characteristics derived from empirical social networks, such as the segregated communities that we typically observe around discus-sions of polarizing topics13,33. How the predictions of our model depend on these features remains to be investigated.

Empirical validation of the predictions generated by our model remains a challenge, given the difficulty of quantifying the fac-tors that affect the intrinsic quality of a meme in the real world. However, we have shown that it is possible to derive empirical estimations of the key parameters in our model, namely the rate of introduction of new memes and the depth of user attention. According to these calibrations, real social media have heteroge-neous levels of information load and attention, which place them in a regime of low discriminative power. If prior research had revealed that intrinsic quality is not a necessary ingredient to explain the broad distribution of meme popularity in social media2,49, the pres-ent results are not much more reassuring. They suggest that better memes do not have a significantly higher likelihood of becoming popular compared with low-quality information. The observation that hoaxes and fake news spread as virally as reliable information in online social media (Fig. 5, inset) is not too surprising in light of these findings.

Our results suggest that one way to increase the discriminative power of online social media would be to reduce information load by limiting the number of posts in the system. Currently, bot accounts controlled by software make up a significant portion of online pro-files59, and many of them flood social media with high volumes

0 0.2 0.4 0.6 0.8 1.00

0.300.250.200.150.100.050.00

Optimal Empiricaldistribution

µ = < µ >α = < α >

0.0250.0500.0750.1000.1200.150

ModelData

0.2 0.4 0.6 0.8 1.0H/H(µ = 1) µ

0.000.050.100.15

0.200.250.30

a

c

b

d

10–2

10–4

10–6

10–8

10–10

100

100 101

α

α = 2α = 4α = 8α = 50

102 103 104

P(α)

P(μ)τ

τDataModel σ = 0.01 Model σ = 0.09Model σ = 0.099

Figure 4 | Tradeoff between discriminative power with diversity, and empirical calibration. a, Tradeoff between discriminative power τ and diversity H/H(μ!= !1). For each value of the attention α, different tradeoffs are obtained by varying μ between 0 and 1. The dashed line suggests convergence in the limit of high α. b, Empirical distribution of μ measured from Twitter. The grey area is produced by the scrolling model described in the Methods section, with ρ!= !0.978, ⟨ q⟩ != !0.1 and σ!= !0.09. c, Empirical distribution of attention α derived from Tumblr. The lines are produced by the scrolling model described in the Methods section, with ρ!= !0.05 and ⟨ q⟩ != !0.1. d, Discriminative power τ of the system for different empirical calibrations of information load μ and attention α. The optimal case is obtained for small information load (μ!≤ !0.5) and large attention (α!≥ !50).

Popularity

10–2

10–4

10–6

10–8

10–10

10–12

P(po

pula

rity)

Low qualityHigh quality

100

100

101

101

102

102

103

103

104

104

105

105

106

Popularity

10–1

10–2

10–3

10–4

10–5

10–6

10–7

10–8

P(po

pula

rity)

Figure 5 | Popularity distributions. Distribution of the number of times that memes in two categories are shared in our model. The top two-thirds of memes are considered high quality (f(m)!≥ !0.34) and the rest are considered low quality. Inset: distribution of the number of times that news articles in two categories are shared on Facebook: articles that support false claims or undermine true claims (low quality) and those that fact-check false claims or support true claims (high quality).

http://dx.doi.org/10.1038/s41562-017-0132


http://www.emergent.info

6




of low-quality information to manipulate public discourse41,60. By aggressively curbing this kind of abuse, social media platforms could improve the overall quality of information to which we are exposed.

MethodsDiffusion model and simulation details. The basic setting for our model is a set of agents connected by a social network. Each agent holds a feed of the α most recent messages produced by their neighbours. The reverse chronological ordering of the feed is a realistic simplifying assumption, which is accurate in social media platforms such as Twitter. In some cases, such as Facebook, the ranking algorithm also considers factors such as popularity and social engagement. However, all platforms give strong priority to recent messages.

At each time step, one agent i is chosen at random. With probability μ, i produces a message carrying a new meme. The meme’s quality is drawn uniformly at random from the unit interval. Alternatively, with probability 1− μ, i selects one of the messages in its feed. The probability that an agent selects a specific message from its feed is proportional to the meme’s quality. More explicitly, let Mi be the feed of i (|Mi| = α). The probability of message m ∈ Mi being selected is

= ∑ ∈P m f m f j( ) ( ) / ( )j Mi where f(m) is the quality of the meme carried by m. The

message is added to the feeds of i’s neighbours; if a feed exceeds α messages, the oldest is forgotten. This mechanism represents how finite attention is allocated to information posted by one’s social connections.

The two parameters of the model allow us to explore how the intensity of the information load (μ) and the attention depth (α) interact with the intrinsic value of an idea and affect its chances of winning.

We analyse the behaviour of the model by simulating the diffusion and information load process on synthetic scale-free networks (see below). As the competition takes place, some of the memes die fast while others live longer and infect a large fraction of the network. Such a process continues until the system reaches a steady state in which the average number of distinct memes remains roughly constant. This number depends on μ.

The popularity of a meme can be defined by the cumulative attention it gathers across the network. In practice, we measure popularity by counting the number of times a meme is shared or reshared. The measurements occur at steady state.

For each experiment and set of parameter values, we simulated the dynamics of the model on a synthetic undirected network. While some popular social networks are directed, many connections are reciprocal61,62. The results presented here employ scale-free networks built with the preferential attachment model, with N = 103 nodes and average degree ⟨ k⟩ = 20. The results generalize to larger and sparser networks with higher clustering coefficients63.

Once the system reached the steady state, we performed measurements to determine the success of a meme. To this end, we considered only memes that were introduced after the system reached the steady state. We followed each of these memes from the moment it was first shared until it completely disappeared from the network, recording its quality as well as its popularity. During each simulation, we monitored 100,000 memes that were introduced and forgotten after the system reached the steady state. We ran each simulation 20 times, so that our analyses of popularity took 2 × 106 memes into consideration. Measurements of discriminative power and diversity were averaged across runs.

Scrolling model of attention. Rather than assuming a fixed feed depth and a single post or repost per session, imagine that users scroll through their feeds by paying attention to messages and resharing them in sequence, until they decide to stop the session. With some probability ρ the user posts a message with a new meme and stops. Otherwise, with probability 1 − ρ, the user performs a scrolling session. After resharing a message, the user stops with probability q. Otherwise, with probability 1− q, the user scrolls down to view and reshare another message, and so on. Let us further assume that for each session, q is drawn uniformly from an interval [⟨ q⟩ − σ,⟨ q⟩ + σ]. The parameter σ represents the level of attention heterogeneity across scrolling sessions. It can be shown analytically that when σ is small, the distribution of α is well approximated by an exponential decay; in the limit σ → 0, P(α) ≈ eλ(α−1) with λ = ln(1 − ⟨ q⟩ ) < 0. As σ → ⟨ q⟩ , for large α the distribution approaches a power law P(α) ≈ α−2; the heavy tail indicates that users occasionally scroll through a large number of messages. These behaviours are illustrated in Fig. 3c. The parameters ρ, ⟨ q⟩ and σ can be tuned to fit the empirical distributions of μ and α from Twitter and Tumblr, respectively (Fig. 4b,c). Both distributions can be fit by approximately the same values of ⟨ q⟩ and σ, while ρ is platform dependent.

To explore the effect of the attention heterogeneity σ on discriminative power, we incorporated the scrolling mechanism into the model to generate α and found that high σ leads to a significant decrease in discriminative power. This is due to the fact that although α increases with σ on average, large values yield little benefit in discriminative power whereas small values cause serious discriminative power loss.

Data. Twitter data to measure hashtag popularity was obtained from a sample of approximately 10% of public tweets provided by the Twitter streaming application programming interface and collected in 2014. Rare hashtags have a lower chance

of being represented in this sample. We extracted the empirical rate μ from 106 Twitter users on the basis of a random sample of these messages. We counted their original tweets (nt) and retweets (nr), then measured each user’s rate as μ = nt/(nt + nr). While inactive users had a lower chance of being represented in the sample, such a bias should not affect the ratio μ.

We extracted the empirical attention data from approximately 107 mobile scrolling sessions observed on Tumblr during two weeks in 2016. The feed interface of this app is similar to those of other social media platforms. We considered a session to have ended when there was no interaction for 30 minutes or longer. During a session, we recorded the number of times that a user scrolled at least 500 pixels through the feed and then stopped for at least one second. This number was used as a proxy for α.

Data about Facebook shares of articles supporting or debunking true and false claims were collected from Emergent, a rumour tracking project that is no longer active. Emergent reporters manually selected and evaluated claims appearing in articles shared on social media between September 2014 and March 2015. The Emergent application programming interface provided data about 742 high-quality articles (464 supporting 56 true claims and 278 fact-checking 72 false claims) and 383 low-quality articles (10 undermining 3 true claims and 373 spreading 90 false claims). The same 2:1 ratio of high- to low-quality memes is used to select a quality threshold in the model and generate the distributions in Fig. 5.

Code availability. The code presented in this manuscript is available at https://github.com/fregolente/virality_of_low_quality_information.

Data availability. The data presented in this manuscript are available at https://github.com/fregolente/virality_of_low_quality_information.

Received 29 September 2016; accepted 24 May 2017; published 26 June 2017

References1. Simon, H. in Computers, Communication, and the Public Interest

(ed. Greenberger, M.) 37–52 (Johns Hopkins Univ. Press, 1971).2. Weng, L., Flammini, A., Vespignani, A. & Menczer, F. Competition among

memes in a world with limited attention. Sci. Rep. 2, 335 (2012).3. Salganik, M. J., Dodds, P. S. & Watts, D. J. Experimental study of

inequality and unpredictability in an artificial cultural market. Science 311, 854–856 (2006).

4. Howell, L. et al. in Global Risks 2013 8th edn (ed. Howell, L.) Section 2 (World Economic Forum, 2013); http://reports.weforum.org/global-risks-2013/risk-case-1/digital-wildfires-in-a-hyperconnected-world/

5. Milton, J. Areopagitica (1644); http://www.dartmouth.edu/~milton/reading_room/areopagitica/text.html

6. Bonilla, J. P. Z. in Philosophy of Economics (ed. Mäki, U.) 823–862 (Handbook of the Philosophy of Science Series, North-Holland, 2012).

7. Dawkins, R. The Selfish Gene (Oxford Univ. Press, 1989).8. Gonçalves, B., Perra, N. & Vespignani A. Modeling users’ activity

on twitter networks: validation of Dunbar’s number. PLoS ONE 6, e22656 (2011).

9. Goldhaber, M. H. The attention economy and the net. First Monday http://dx.doi.org/10.5210/fm.v2i4.519 (1997).

10. Falkinger, J. Attention economies. J. Econ. Theory 133, 266–294 (2007).11. Ciampaglia, G. L., Flammini, A. & Menczer, F. The production of information

in the attention economy. Sci. Rep. 5, 9452 (2015).12. Conover, M., Gonçalves, B., Ratkiewicz, J., Flammini, A. & Menczer, F.

Predicting the political alignment of twitter users. In Proc. 3rd IEEE Conference on Social Computing (SocialCom) 192–199 (IEEE, 2011).

13. Conover, M. D., Gonçalves, B., Flammini, A. & Menczer, F. Partisan asymmetries in online political activity. EPJ Data Sci. 1, 6 (2012).

14. DiGrazia, J., McKelvey, K., Bollen, J. & Rojas, F. More tweets, more votes: social media as a quantitative indicator of political behaviour. PLoS ONE 8, e79449 (2013).

15. Adler, M. Stardom and talent. Am. Econ. Rev. 75, 208–12 (1985).16. Bailard, C. S. Democracy’s Double-Edged Sword: How Internet Use Changes

Citizens’ Views of Their Government (Johns Hopkins Univ. Press, 2014).17. Surowiecki, J. The Wisdom of Crowds (Anchor, 2005).18. Page, S. E. The Difference: How the Power of Diversity Creates Better Groups,

Firms, Schools, and Societies (Princeton Univ. Press, 2008).19. Lorenz, J., Rauhut, H., Schweitzer, F. & Helbing, D. How social influence can

undermine the wisdom of crowd effect. Proc. Natl Acad. Sci. USA 108, 9020–9025 (2011).

20. Collins, E. C., Percy, E. J., Smith, E. R. & Kruschke, J. K. Integrating advice and experience: learning and decision making with social and nonsocial cues. J. Pers. Soc. Psychol. 100, 967–982 (2011).

21. Smith, E. R. & Collins, E. C. Contextualizing person perception: distributed social cognition. Psychol. Rev. 116, 343–364 (2009).

http://dx.doi.org/10.1038/s41562-017-0132


https://github.com/fregolente/virality_of_low_quality_information

https://github.com/fregolente/virality_of_low_quality_information

http://reports.weforum.org/global-risks-2013/risk-case-1/digital-wildfires-in-a-hyperconnected-world/

http://reports.weforum.org/global-risks-2013/risk-case-1/digital-wildfires-in-a-hyperconnected-world/

http://www.dartmouth.edu/~milton/reading_room/areopagitica/text.html

http://www.dartmouth.edu/~milton/reading_room/areopagitica/text.html

http://dx.doi.org/10.5210/fm.v2i4.519

7




22. Nickerson, R. S. Confirmation bias: a ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 2, 175–220 (1998).

23. Smith, E. R. Evil acts and malicious gossip: a multiagent model of the effects of gossip in socially distributed person perception. Pers. Soc. Psychol. Rev. 18, 311–325 (2014).

24. Centola, D. & Macy, M. Complex contagions and the weakness of long ties. Am. J. Sociol. 113, 702–734 (2007).

25. Hiltz, S. R. & Turoff, M. Structuring computer-mediated communication systems to avoid information overload. Commun. ACM 28, 680–689 (1985).

26. Frey, D. Recent research on selective exposure to information. J. Exp. Soc. Psychol. 19, 41–80 (1986).

27. Weng, L. et al. The role of information diffusion in the evolution of social networks. In Proc. 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Dhillon, I. S. et al.) 356–364 (ACM, 2013).

28. Babaei, M., Grabowicz, P., Valera, I., Gummadi, K. P. & Gomez-Rodriguez, M. On the efficiency of the information networks in social media. In Proc. 9th ACM International Conference on Web Search and Data Mining 83–92 (ACM, 2016).

29. Axelrod, R. The dissemination of culture a model with local convergence and global polarization. J. Confl. Resolut. 41, 203–226 (1997).

30. Sunstein, C. R. Republic.com 2.0 (Princeton Univ. Press, 2009).31. Pariser, E. The Filter Bubble: How the New Personalized Web Is Changing

What We Read and How We Think (Penguin, 2011).32. Sunstein, C. R. The law of group polarization. J. Political Philos. 10,

175–195 (2002).33. Conover, M. et al. Political polarization on twitter. In Proc. 5th International

AAAI Conference on Weblogs and Social Media (AAAI, 2011).34. Stanovich, K. E., West, R. F & Toplak, M. E. Myside bias, rational thinking,

and intelligence. Curr. Dir. Psychol. Sci. 22, 259–264 (2013).35. Nikolov, D., Oliveira, D. F. M., Flammini, A. & Menczer, F. Measuring online

social bubbles. PeerJ Comput. Sci. 1, e38 (2015).36. Mason, W. A., Conrey, F. R. & Smith, E. R. Situating social influence

processes: dynamic, multidirectional flows of influence within social networks. Pers. Soc. Psychol. Rev. 11, 279–300 2007.

37. Nisbett, R. & Ross, L. The Person and the Situation (McGraw-Hill, 1991).38. Nyhan, B. & Reifler, J. When corrections fail: the persistence of political

misperceptions. J. Polit. Behav. 32, 303–330 (2010).39. Vicario, M. D. et al. The spreading of misinformation online. Proc. Natl Acad.

Sci. USA 113, 554–559 (2016).40. Ratkiewicz, J. et al. Detecting and tracking political abuse in social media.

In Proc. 5th International AAAI Conference on Weblogs and Social Media (AAAI, 2011).

41. Ferrara, E., Varol, O., Davis, C., Menczer, F. & Flammini, A. The rise of social bots. Comm. ACM 57, 96–104 (2016).

42. Crane, R. & Sornette, D. Robust dynamic classes revealed by measuring the response function of a social system. Proc. Natl Acad. Sci. USA 105, 15649–15653 (2008).

43. Ratkiewicz, J., Fortunato, S., Flammini, A., Menczer, F. & Vespignani, A. Characterizing and modeling the dynamics of online popularity. Phys. Rev. Lett. 105, 158701(2010).

44. Bingol, H. Fame emerges as a result of small memory. Phys. Rev. E 77, 036118 (2008).

45. Huberman, B. A. Social computing and the attention economy. J. Stat. Phys. 151, 329–339 (2013).

46. Wu, F. & Huberman, B. A. Novelty and collective attention. Proc. Natl Acad. Sci. USA 104, 17599–17601 (2007).

47. Hodas, N. O. & Lerman, K. How visibility and divided attention constrain social contagion. In Proc. ASE/IEEE International Conference on Social Computing 249–257 (IEEE, 2012).

48. Kang, J.-H. & Lerman, K. in Social Computing, Behavioral Modeling and Prediction. SBP 2015. Lecture Notes in Computer Science Vol. 9021 (eds Agarwal, N. et al.) 101–110 (Springer, 2015).

49. Gleeson, J. P., Ward, J. A., O’Sullivan, K. P. & Lee, W. T. Competition-induced criticality in a model of meme popularity. Phys. Rev. Lett. 112, 048701 (2014).

50. Gleeson, J. P., O’Sullivan, K. P., Baños, R. A. & Moreno, Y. Effects of network structure, competition and memory time on social spreading phenomena. Phys. Rev. X 6, 021019 (2016).

51. Morris, S. Contagion. Rev. Econ. Stud. 67, 57–78 (2000).52. Goffman, W. & Newill, V. A. Generalization of epidemic theory. Nature 204,

225–228 (1964).53. Daley, D. J. & Kendall, D. G. Epidemics and rumours. Nature 204,

1118 (1964).54. Bailey, N. T. J. The Mathematical Theory of Infectious Diseases and Its

Applications (Charles Griffin & Co., 1975).55. Goetz, M., Leskovec, J., McGlohon, M. & Faloutsos, C. Modeling blog

dynamics. In Proc. International AAAI Conference on Weblogs and Social Media (eds Adar, E. et al.) (AAAI, 2009).

56. Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-law distributions in empirical data. SIAM Rev. 51, 661–703 (2009).

57. Simkin, M. V. & Roychowdhury, V. P. A mathematical theory of citing. J. Assoc. Inf. Sci. Technol. 58, 1661–1673 (2007).

58. Kendall, M. A new measure of rank correlation. Biometrika 30, 81–89 (1938).

59. Varol, O., Ferrara, E., Davis, C. A., Menczer, F. & Flammini, A. Online human-bot interactions: detection, estimation, and characterization. In Proc. International AAAI Conference on Web and Social Media (AAAI, 2017).

60. Bessi, A. & Ferrara, E. Social bots distort the 2016 U.S. Presidential election online discussion. First Monday http://dx.doi.org/10.5210/fm.v21i11.7090 (2016).

61. Kumar, R., Novak, J. & Tomkins, A. Structure and evolution of online social networks. In Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 611–617 (ACM, 2006).

62. Kwak, H., Lee, C., Park, H. & Moon, S. What is Twitter, a social network or a news media? In Proc. 19th International Conference on World Wide Web 591–600 (ACM, 2010).

63. Holme, P. & Kim, B. J. Growing scale-free networks with tunable clustering. Phys. Rev. E 65, 026107 (2002).

AcknowledgementsWe are grateful to Twitter for providing public post data, to Tumblr for mobile scrolling data, to C. Silverman for the Emergent data, and to J. Gleeson, K. Church, S. Buthpitiya, M. Patel and G. Ciampaglia for discussions and assistance with the data analysis. This work was supported in part by the James S. McDonnell Foundation (grant 220020274) and the National Science Foundation (award CCF-1101743). X.Q. thanks the NaN group in the Center for Complex Networks and Systems Research (http://cnets.indiana.edu) for the hospitality during her stay at the Indiana University School of Informatics and Computing. She was supported by grants from the National Natural Science Foundation of China (No. 90924030), the China Scholarship Council, the ‘Shuguang’ Project of Shanghai Education Commission (No. 09SG38), and the Program of Social Development of Metropolis and Construction of Smart City (No. 085SHDX001). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author contributionsA.F. and F.M. developed the research question. X.Q., D.F.M.O., A.F. and F.M. designed the model. X.Q. and D.F.M.O. conducted the simulations and the primary analyses. D.F.M.O., A.S.S. and F.M. collected and analysed the empirical data. D.F.M.O., A.F. and F.M. wrote the manuscript. X.Q. and A.S.S. edited the manuscript.

Additional informationReprints and permissions information is available at www.nature.com/reprints.Correspondence and requests for materials should be addressed to D.F.M.O.How to cite this article: Qiu, X., F. M. Oliveira, D., Sahami Shirazi, A., Flammini, A. & Menczer, F. Limited individual attention and online virality of low-quality information. Nat. Hum. Behav. 1, 0132 (2017).Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Competing interestsThe authors declare no competing interests.

http://dx.doi.org/10.1038/s41562-017-0132




http://cnets.indiana.edu

http://www.nature.com/reprints/

limited individual attention and online virality of low-quality … · social media recommendation,...

Documents