hashtags on twitter: information diffusionpawang/courses/sc15/lec3.pdf · pawan goyal (iit...

33
Hashtags on Twitter: Information Diffusion Pawan Goyal CSE, IITKGP July 31, 2015 Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 1 / 16

Upload: others

Post on 22-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Hashtags on Twitter: Information Diffusion

Pawan Goyal

CSE, IITKGP

July 31, 2015

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 1 / 16

Page 2: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Third Reference

Daniel M. Romero, Brendan Meeder, and Jon Kleinberg. 2011. Differences inthe mechanics of information diffusion across topics: idioms, politicalhashtags, and complex contagion on twitter. In Proceedings of the 20thinternational conference on World wide web (WWW ’11). ACM, New York, NY,USA, 695-704.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 2 / 16

Page 3: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

What is Information Diffusion?

Online Information DiffusionUnderstanding the tendency for people to engage in activities such asforwarding messages, linking to articles, joining groups, purchasing products,or becoming fans of pages after some number of their friends have.

Objectives of this researchWidespread belief that different kinds of information spread differentlyonline.

To study this issue on Twitter, analyzing the ways in which Hashtagsspread on a network defined by interactions among Twitter users.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 3 / 16

Page 4: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

What is Information Diffusion?

Online Information DiffusionUnderstanding the tendency for people to engage in activities such asforwarding messages, linking to articles, joining groups, purchasing products,or becoming fans of pages after some number of their friends have.

Objectives of this researchWidespread belief that different kinds of information spread differentlyonline.

To study this issue on Twitter, analyzing the ways in which Hashtagsspread on a network defined by interactions among Twitter users.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 3 / 16

Page 5: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

What is Information Diffusion?

Online Information DiffusionUnderstanding the tendency for people to engage in activities such asforwarding messages, linking to articles, joining groups, purchasing products,or becoming fans of pages after some number of their friends have.

Objectives of this researchWidespread belief that different kinds of information spread differentlyonline.

To study this issue on Twitter, analyzing the ways in which Hashtagsspread on a network defined by interactions among Twitter users.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 3 / 16

Page 6: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Twitter Data and Graph Construction

Twitter data crawled from August 2009 until January 2010.

Collected over 3 billion messages from more than 60 million users.

Graph construction via @-messages: X → Y if X directed at least 3@-messages to Y .

Graph size: 8.5 million non-isolated nodes, 50 million links

Studies 500 most used hashtags

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16

Page 7: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Twitter Data and Graph Construction

Twitter data crawled from August 2009 until January 2010.

Collected over 3 billion messages from more than 60 million users.

Graph construction via @-messages: X → Y if X directed at least 3@-messages to Y .

Graph size: 8.5 million non-isolated nodes, 50 million links

Studies 500 most used hashtags

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16

Page 8: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Twitter Data and Graph Construction

Twitter data crawled from August 2009 until January 2010.

Collected over 3 billion messages from more than 60 million users.

Graph construction via @-messages: X → Y if X directed at least 3@-messages to Y .

Graph size: 8.5 million non-isolated nodes, 50 million links

Studies 500 most used hashtags

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16

Page 9: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Twitter Data and Graph Construction

Twitter data crawled from August 2009 until January 2010.

Collected over 3 billion messages from more than 60 million users.

Graph construction via @-messages: X → Y if X directed at least 3@-messages to Y .

Graph size: 8.5 million non-isolated nodes, 50 million links

Studies 500 most used hashtags

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16

Page 10: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Twitter Data and Graph Construction

Twitter data crawled from August 2009 until January 2010.

Collected over 3 billion messages from more than 60 million users.

Graph construction via @-messages: X → Y if X directed at least 3@-messages to Y .

Graph size: 8.5 million non-isolated nodes, 50 million links

Studies 500 most used hashtags

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16

Page 11: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Hashtag Categories

Manually identified 8broad categories withatleast 20 HTs in each

Authors and 3volunteersindependentlyannotated eachhashtag.

Levels of agreementwas high

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 5 / 16

Page 12: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Exposure Curve: Defining p(k)

Neighbor Set of XFor a given user X, the set of other users to whom X has an edge.

When does X start mentioning a hashtag H?How do successive exposures to H affect the probability that X will beginmentioning it?

Look at all users X who have not mentioned H, but for whom k neighborshave

p(k): fraction of users who adopt the hashtag direct after their kth

exposure, given that they hadn’t yet adopted it.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 6 / 16

Page 13: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Exposure Curve: Defining p(k)

Neighbor Set of XFor a given user X, the set of other users to whom X has an edge.

When does X start mentioning a hashtag H?How do successive exposures to H affect the probability that X will beginmentioning it?

Look at all users X who have not mentioned H, but for whom k neighborshave

p(k): fraction of users who adopt the hashtag direct after their kth

exposure, given that they hadn’t yet adopted it.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 6 / 16

Page 14: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Exposure Curve: Defining p(k)

Neighbor Set of XFor a given user X, the set of other users to whom X has an edge.

When does X start mentioning a hashtag H?How do successive exposures to H affect the probability that X will beginmentioning it?

Look at all users X who have not mentioned H, but for whom k neighborshave

p(k): fraction of users who adopt the hashtag direct after their kth

exposure, given that they hadn’t yet adopted it.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 6 / 16

Page 15: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Average Exposure Curve for 500 most-mentioned hashtags

A ramp-up to the peak value, reached relatively early (k = 2,3,4)

Decline for larger values of k

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 7 / 16

Page 16: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Average Exposure Curve for 500 most-mentioned hashtags

A ramp-up to the peak value, reached relatively early (k = 2,3,4)

Decline for larger values of k

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 7 / 16

Page 17: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Persistence and Stickiness

StickinessThe maximum value of p(k)(probability of usage at the mosteffective exposure)

PersistenceA measure of the decay ofexposure curves.The ratio of the area under thecurve P and the area of therectangle of length max(P) andwidth max(D(P)).

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 8 / 16

Page 18: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Persistence and Stickiness

StickinessThe maximum value of p(k)(probability of usage at the mosteffective exposure)

PersistenceA measure of the decay ofexposure curves.

The ratio of the area under thecurve P and the area of therectangle of length max(P) andwidth max(D(P)).

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 8 / 16

Page 19: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Persistence and Stickiness

StickinessThe maximum value of p(k)(probability of usage at the mosteffective exposure)

PersistenceA measure of the decay ofexposure curves.The ratio of the area under thecurve P and the area of therectangle of length max(P) andwidth max(D(P)).

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 8 / 16

Page 20: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Approximating Exposure Curves via Stickiness and Persistence

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 9 / 16

Page 21: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Comparison of Hashtags based on Stickiness

Technology and Movies have lower stickiness than a random subset

Music has higher stickiness than a random subset

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 10 / 16

Page 22: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Comparison of Hashtags based on Stickiness

Technology and Movies have lower stickiness than a random subset

Music has higher stickiness than a random subset

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 10 / 16

Page 23: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Comparison of Hashtags based on Persistence

Idioms and Music have lower persistence than a random subset ofhashtags of the same size

Politics and Sports have higher persistence than a random subset

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 11 / 16

Page 24: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Comparison of Hashtags based on Persistence

Idioms and Music have lower persistence than a random subset ofhashtags of the same size

Politics and Sports have higher persistence than a random subset

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 11 / 16

Page 25: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Persistence vs. Stickiness

Idioms and Politics: Same stickiness but opposite persistence

Music has high stickiness but low persistence

Stickiness does not explain the diffusion well by itself

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 12 / 16

Page 26: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Persistence vs. Stickiness

Idioms and Politics: Same stickiness but opposite persistence

Music has high stickiness but low persistence

Stickiness does not explain the diffusion well by itself

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 12 / 16

Page 27: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Persistence vs. Stickiness

Idioms and Politics: Same stickiness but opposite persistence

Music has high stickiness but low persistence

Stickiness does not explain the diffusion well by itself

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 12 / 16

Page 28: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Sample curves for #cantlivewithout (blue) and #hcr (red)

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 13 / 16

Page 29: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Comparison of Hashtag by Mention and User Counts

Political and Idioms are among the most mentioned, but Idioms are used bytwice the number of people that use Politics

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 14 / 16

Page 30: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Comparison of Hashtag by Mention and User Counts

Political and Idioms are among the most mentioned, but Idioms are used bytwice the number of people that use Politics

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 14 / 16

Page 31: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

The Structure of Initial Sets

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 15 / 16

Page 32: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Structure Comparison for Political Hashtags (G500)

The early adopters of a political hashtag message more with each other,create more triangles, and have a border of people with more links intothe early adopter set.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 16 / 16

Page 33: Hashtags on Twitter: Information Diffusionpawang/courses/SC15/lec3.pdf · Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 4 / 16. Twitter Data and Graph Construction

Structure Comparison for Political Hashtags (G500)

The early adopters of a political hashtag message more with each other,create more triangles, and have a border of people with more links intothe early adopter set.

Pawan Goyal (IIT Kharagpur) Hashtags on Twitter July 31, 2015 16 / 16