virality prediction and community structure in social networks · virality prediction and community...
TRANSCRIPT
Virality Prediction and Community Structure
in Social NetworksLilian Weng, Filippo Menczer, Yong-Yeol Ahn
Center for Complex Networks and Systems Research (CNetS)School of Informatics and Computing
Uploaded to Youtube on 2012 July 15More than 1.38 billions views now!
Uploaded to Youtube on 2012 July 15More than 1.38 billions views now!
World population7.07 billions
http://researchinprogress.tumblr.com
As a PhD student:
As a post doc:
We are pleased to inform you that your paper has been accepted!
4 months2.4k FB likes 71 notes
As a professor:
by Nikolaj & Jilles
CorporationsGovernment
Political Campaigns
Virality Attention
Viral information spread through
social networks
Epidemic Spreading:Germs and viruses spread
through the “social” network
Ideas and behaviors also spread
Are they same?
Aha! They are Different
Social ReinforcementMultiple exposures
Social ReinforcementMultiple exposures
Social ReinforcementMultiple exposures
Social ReinforcementMultiple exposures
Social ReinforcementMultiple exposures
D. Centola, Science 2010
“Small” world“Large” world
Which network is better at spreading information quickly?
Information spread more quickly on “large” world network
D. Centola, Science 2010
Information spread more quickly on “large” world network
Easier to haveMultiple exposure
D. Centola, Science 2010
Information spread more quickly on “large” world network
D. Centola, Science 2010
Information spread more quickly on “large” world network
Harder to haveMultiple exposure
D. Centola, Science 2010
Simple Contagions
Epidemic Spreading:Germs spread through the
“social” network
Ideas and behaviors also spread
D. Centola and M. Macy. Complex Contagions and the Weakness of Long Ties. AJS, 2007.
Complex Contagions
(Node)Users
(Edge) Social Relationship
Social Circles(Community)
Social Circles(Community)
How do communities affect information
diffusion?
Exposures
Figure 1: The importance of community structure in the spreading of social contagions.(A) Structural trapping: dense communities with few outgoing links naturally trap informa-tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) triggermultiple exposures to others (red nodes). In the presence of high clustering, any additionaladoption is likely to produce more multiple exposures than in the case of low clustering, in-ducing cascades of additional adoptions. (C) Homophily: people in the same community (samecolor nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structurebased on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent Englishusers and red nodes are Arabic users. Node size and link weight are proportional to retweet ac-tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews.Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used bothhashtags are green. Node size is proportional to usage (tweet) activity, links represent mutualfollowing relations.
10
Structural Trapping
Traps for random walkers
Social Reinforcement
(B) Social Reinforcement
High Clustering Low Clustering
A
MultipleExposures Multiple
Exposures
Figure 1: The importance of community structure in the spreading of social contagions.(A) Structural trapping: dense communities with few outgoing links naturally trap informa-tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) triggermultiple exposures to others (red nodes). In the presence of high clustering, any additionaladoption is likely to produce more multiple exposures than in the case of low clustering, in-ducing cascades of additional adoptions. (C) Homophily: people in the same community (samecolor nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structurebased on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent Englishusers and red nodes are Arabic users. Node size and link weight are proportional to retweet ac-tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews.Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used bothhashtags are green. Node size is proportional to usage (tweet) activity, links represent mutualfollowing relations.
10
A AB B
(C) Homophily
Figure 1: The importance of community structure in the spreading of social contagions.(A) Structural trapping: dense communities with few outgoing links naturally trap informa-tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) triggermultiple exposures to others (red nodes). In the presence of high clustering, any additionaladoption is likely to produce more multiple exposures than in the case of low clustering, in-ducing cascades of additional adoptions. (C) Homophily: people in the same community (samecolor nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structurebased on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent Englishusers and red nodes are Arabic users. Node size and link weight are proportional to retweet ac-tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews.Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used bothhashtags are green. Node size is proportional to usage (tweet) activity, links represent mutualfollowing relations.
10
Homophily
Communities trap information.
(1) Structural Trapping(2) Social Reinforcement(3) Homophily
More communication within than across communities;
500 million users340 million tweets per day
Gardenhose (10%)
FollowSubscribe users
HashtagTopic identifier, i.e. #ows
TweetShort messages
RetweetSpread messages
Mention@users
Hashtag ~ Meme
[1] Richard Dawkins. The Selfish Gene. 1989.
[1]
Hashtag ~ Meme
[1] Richard Dawkins. The Selfish Gene. 1989.
[1]
truthy.indiana.edu
Two community detection methods
Disjoint CommunitiesInfomap (Rosvall & Bergstrom, 2008)
Overlapping Communities
Link clustering (Ahn, Bagrow, Lehmann, 2010)
BBC News#bbc
Fox News#foxnews
(E) Follower Network
Figure 1: The importance of community structure in the spreading of social contagions.(A) Structural trapping: dense communities with few outgoing links naturally trap informa-tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) triggermultiple exposures to others (red nodes). In the presence of high clustering, any additionaladoption is likely to produce more multiple exposures than in the case of low clustering, in-ducing cascades of additional adoptions. (C) Homophily: people in the same community (samecolor nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structurebased on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent Englishusers and red nodes are Arabic users. Node size and link weight are proportional to retweet ac-tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews.Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used bothhashtags are green. Node size is proportional to usage (tweet) activity, links represent mutualfollowing relations.
10
Follower Network
English#usa
Arabic#usa
Figure 1: The importance of community structure in the spreading of social contagions.(A) Structural trapping: dense communities with few outgoing links naturally trap informa-tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) triggermultiple exposures to others (red nodes). In the presence of high clustering, any additionaladoption is likely to produce more multiple exposures than in the case of low clustering, in-ducing cascades of additional adoptions. (C) Homophily: people in the same community (samecolor nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structurebased on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent Englishusers and red nodes are Arabic users. Node size and link weight are proportional to retweet ac-tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews.Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used bothhashtags are green. Node size is proportional to usage (tweet) activity, links represent mutualfollowing relations.
10
Retweet Network
Do the edges inside communities transmit
more information?
For each community, we measure the average edge weights of intra-
and inter-community links.
wRT w@ wRT w@
(A)
Figure 2: Meme concentration in communities. We show (A) community edge weight and(B) user community focus using box plots. Boxes cover 50% of data and whisker cover 95%.The line and triangle in a box represent the median and mean, respectively. Changes in memeconcentration as a function of meme popularity are illustrated by plotting (C) meme usagedominance, (D) meme usage entropy, (E) meme adopter dominance, and (F) meme adopterentropy. Dominance and entropy ratios are averaged across hashtags in each popularity bin,with popularity defined as number of tweets T or adopters U ; error bars indicate standard errors.Gray areas represent the ranges of popularity in which actual data exhibit weaker concentrationthan the baseline model M2. Similar results for different types of networks and communitymethods are described in Supplementary Information.
11
f RT f RTf @ f @
(B)
Figure 2: Meme concentration in communities. We show (A) community edge weight and(B) user community focus using box plots. Boxes cover 50% of data and whisker cover 95%.The line and triangle in a box represent the median and mean, respectively. Changes in memeconcentration as a function of meme popularity are illustrated by plotting (C) meme usagedominance, (D) meme usage entropy, (E) meme adopter dominance, and (F) meme adopterentropy. Dominance and entropy ratios are averaged across hashtags in each popularity bin,with popularity defined as number of tweets T or adopters U ; error bars indicate standard errors.Gray areas represent the ranges of popularity in which actual data exhibit weaker concentrationthan the baseline model M2. Similar results for different types of networks and communitymethods are described in Supplementary Information.
11
For each user, we measure fraction of activity that is directed to each neighbor
in the same or different community.
Is the communication concentrated inside
communities?
No concentrationRandomly distributed
Weak concentrationRandomly diffusion
Distributed more in a few communities
Strong Concentration
Null models
Random selection
Random diffusion(structural trapping)
M1
M2
Real(C)
Figure 2: Meme concentration in communities. We show (A) community edge weight and(B) user community focus using box plots. Boxes cover 50% of data and whisker cover 95%.The line and triangle in a box represent the median and mean, respectively. Changes in memeconcentration as a function of meme popularity are illustrated by plotting (C) meme usagedominance, (D) meme usage entropy, (E) meme adopter dominance, and (F) meme adopterentropy. Dominance and entropy ratios are averaged across hashtags in each popularity bin,with popularity defined as number of tweets T or adopters U ; error bars indicate standard errors.Gray areas represent the ranges of popularity in which actual data exhibit weaker concentrationthan the baseline model M2. Similar results for different types of networks and communitymethods are described in Supplementary Information.
11
Proportion of #tweets in dominant
community
Total #tweets
Relative Usage Dominance
f ff f
Real (D)
Figure 2: Meme concentration in communities. We show (A) community edge weight and(B) user community focus using box plots. Boxes cover 50% of data and whisker cover 95%.The line and triangle in a box represent the median and mean, respectively. Changes in memeconcentration as a function of meme popularity are illustrated by plotting (C) meme usagedominance, (D) meme usage entropy, (E) meme adopter dominance, and (F) meme adopterentropy. Dominance and entropy ratios are averaged across hashtags in each popularity bin,with popularity defined as number of tweets T or adopters U ; error bars indicate standard errors.Gray areas represent the ranges of popularity in which actual data exhibit weaker concentrationthan the baseline model M2. Similar results for different types of networks and communitymethods are described in Supplementary Information.
11
Entropy of #tweets distributed in
different communities
Total #tweets
Relative Usage Entropy
Viral memes are less trapped by communities, spreading like diseases.
Viral memes
(B) Social Reinforcement
High Clustering Low Clustering
(A) Structural Trapping (C) Homophily
(D) Retweet Network
A
MultipleExposures Multiple
Exposures
(E) Follower Network
Figure 1: The importance of community structure in the spreading of social contagions.(A) Structural trapping: dense communities with few outgoing links naturally trap informa-tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) triggermultiple exposures to others (red nodes). In the presence of high clustering, any additionaladoption is likely to produce more multiple exposures than in the case of low clustering, in-ducing cascades of additional adoptions. (C) Homophily: people in the same community (samecolor nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structurebased on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent Englishusers and red nodes are Arabic users. Node size and link weight are proportional to retweet ac-tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews.Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used bothhashtags are green. Node size is proportional to usage (tweet) activity, links represent mutualfollowing relations.
10
Just thisOther memes
Viral memes
(B) Social Reinforcement
High Clustering Low Clustering
(A) Structural Trapping (C) Homophily
(D) Retweet Network
A
MultipleExposures Multiple
Exposures
(E) Follower Network
Figure 1: The importance of community structure in the spreading of social contagions.(A) Structural trapping: dense communities with few outgoing links naturally trap informa-tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) triggermultiple exposures to others (red nodes). In the presence of high clustering, any additionaladoption is likely to produce more multiple exposures than in the case of low clustering, in-ducing cascades of additional adoptions. (C) Homophily: people in the same community (samecolor nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structurebased on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent Englishusers and red nodes are Arabic users. Node size and link weight are proportional to retweet ac-tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews.Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used bothhashtags are green. Node size is proportional to usage (tweet) activity, links represent mutualfollowing relations.
10
Just thisOther memes
Simple Contagions
Complex Contagions
Avg. #exposures required for each
adopters
Total #tweets
Complex Contagions
Simple Contagions
No enough data
Maybe....We can do virality prediction by
qualifying concentration?
Old New Less dominant More dominant
Early Stage Late Stage
(A) #ThoughtsDuringSchool
Early Stage Late Stage
(B) #ProperBand
Figure 4: Evolution of two contrasting memes (viral vs. non-viral) in terms of commu-nity structure. We represent each community as a node, whose size is proportional to thenumber of tweets produced by the community. The color of a community represents thetime when the hashtag is first used in the community. (A) The evolution of a viral meme(#ThoughtsDuringSchool) from the early stage (30 tweets) to the late stage (200 tweets) ofdiffusion. (B) The evolution of a non-viral meme (#ProperBand) from the early stage (30tweets) to the final stage (65 tweets).
13
30 tweets
30 tweets
Old New Less dominant More dominant
Early Stage Late Stage
(A) #ThoughtsDuringSchool
Early Stage Late Stage
(B) #ProperBand
Figure 4: Evolution of two contrasting memes (viral vs. non-viral) in terms of commu-nity structure. We represent each community as a node, whose size is proportional to thenumber of tweets produced by the community. The color of a community represents thetime when the hashtag is first used in the community. (A) The evolution of a viral meme(#ThoughtsDuringSchool) from the early stage (30 tweets) to the late stage (200 tweets) ofdiffusion. (B) The evolution of a non-viral meme (#ProperBand) from the early stage (30tweets) to the final stage (65 tweets).
13
30 tweets
30 tweets
200 tweets
65 tweets
Prec
isio
n
Community-based predictionCommunity-blind predictionRandom guess
(A) (B)
(C) (D)
Reca
ll
0.250.20
0.12
0.50 0.48
0.28
0.780.71
0.50
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.210.15 0.11
0.52
0.350.29
0.760.67
0.49
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.250.19 0.12
0.390.31
0.08
0.55 0.53
0.24
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.220.14 0.11
0.41
0.21
0.08
0.600.49
0.23
0.0
0.2
0.4
0.6
0.8
300 500 1000
Figure 3: Prediction performance. We predict whether a meme will go viral or not; a meme islabeled as viral if it produces more than a certain threshold (q = 300,500,1000) of tweets (T ) oradopters (U). We use the random forests classifier trained on community concentration features,which are calculated based on the initial n = 50 tweets for each meme. Prediction resultsare robust across different networks and community detection methods (see SupplementaryInformation). We compute precision and recall to compare our prediction results against twobaselines.
12
Viral memes tweets.� ⇥T
Prec
isio
n
Community-based predictionCommunity-blind predictionRandom guess
(A) (B)
(C) (D)
Reca
ll
0.250.20
0.12
0.50 0.48
0.28
0.780.71
0.50
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.210.15 0.11
0.52
0.350.29
0.760.67
0.49
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.250.19 0.12
0.390.31
0.08
0.55 0.53
0.24
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.220.14 0.11
0.41
0.21
0.08
0.600.49
0.23
0.0
0.2
0.4
0.6
0.8
300 500 1000
Figure 3: Prediction performance. We predict whether a meme will go viral or not; a meme islabeled as viral if it produces more than a certain threshold (q = 300,500,1000) of tweets (T ) oradopters (U). We use the random forests classifier trained on community concentration features,which are calculated based on the initial n = 50 tweets for each meme. Prediction resultsare robust across different networks and community detection methods (see SupplementaryInformation). We compute precision and recall to compare our prediction results against twobaselines.
12
⇥T ⇥T
Prec
isio
n
Community-based predictionCommunity-blind predictionRandom guess
(A) (B)
(C) (D)
Reca
ll
0.250.20
0.12
0.50 0.48
0.28
0.780.71
0.50
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.210.15 0.11
0.52
0.350.29
0.760.67
0.49
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.250.19 0.12
0.390.31
0.08
0.55 0.53
0.24
0.0
0.2
0.4
0.6
0.8
300 500 1000
0.220.14 0.11
0.41
0.21
0.08
0.600.49
0.23
0.0
0.2
0.4
0.6
0.8
300 500 1000
Figure 3: Prediction performance. We predict whether a meme will go viral or not; a meme islabeled as viral if it produces more than a certain threshold (q = 300,500,1000) of tweets (T ) oradopters (U). We use the random forests classifier trained on community concentration features,which are calculated based on the initial n = 50 tweets for each meme. Prediction resultsare robust across different networks and community detection methods (see SupplementaryInformation). We compute precision and recall to compare our prediction results against twobaselines.
12
Summary•Communities give us invaluable information about
spreading patterns of memes.
•We can predict viral memes by looking at communities
•Non-viral memes seems to act like complex contagions, strongly affected by social reinforcement and homophily, while viral memes are not.
•Viral memes spread like (literally) epidemics.
Thanks! Questions?
Lilian Weng Fil Menczer YY Ahn
{weng, fil, yyahn}@indiana.edu