jure leskovec, cmu eric horwitz, microsoft research

32
Planetary Scale Views on an Instant Messenger Communication Network Jure Leskovec, CMU Eric Horwitz, Microsoft Research

Upload: francis-george

Post on 28-Dec-2015

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

Planetary Scale Views on an Instant Messenger Communication Network

Jure Leskovec, CMUEric Horwitz, Microsoft Research

Page 2: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

2

Instant Messaging

Contact (buddy) list Messaging window

Page 3: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

3

Instant Messaging as a Network

Buddy Conversation

Page 4: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

4

IM – Phenomena at planetary scale

Observe social and communication phenomena at planetary scale

Practically the largest social network

Research questions: How does communication change with user

demographics (age, sex, language, country)? How does geography affect communication? What is the structure of the communication

network?

Page 5: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

5

Communication data

The record of communication User demographic data (self-reported):

Age Gender Location (Country, ZIP) Language

Communication data: For each conversation: participants and time No message text

Page 6: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

6

Data statistics: Total activity We collected the data for June 2006 Log size:

150Gb/day (compressed) Total: 1 month of communication data:

4.5Tb of compressed data Activity over June 2006 (30 days)

245 million users logged in 180 million users engaged in conversations 17,5 million new accounts activated More than 30 billion conversations More than 255 billion messages

Page 7: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

7

Data statistics: Typical day

Activity on June 1 2006: 1 billion conversations 93 million users login 65 million different users talk (exchange

messages) 1.5 million invitations for new accounts sent

Page 8: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

Part 3-8

User & Communication characteristics

How does user demographics influence communication?

Page 9: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

9

User characteristics: Age

Page 10: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

10

Age piramid: MSN vs. the world

Page 11: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

Part 3-11

Communication: Demographics

Correlation

People tend to talk to similar people (except gender)

How do people’s attributes (age, gender) influence communication?

Probability

Page 12: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

12

Age: Number of conversations

Use

r se

lf r

eport

ed a

ge

High

Low

1) Young people communicate with same

age2) Older people

communicate uniformly across ages

Page 13: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

13

Age: Total conversation duration

Use

r se

lf r

eport

ed a

ge

High

Low

1) Old people talk long2) Working ages (25-40)

talk short

Page 14: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

14

Age: Messages per conversation

Use

r se

lf r

eport

ed a

ge

High

Low

1) Old people talk long2) Working ages (25-40)

talk quick

Page 15: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

15

Age: Messages per unit time

Use

r se

lf r

eport

ed a

ge

High

Low

1) Old people talk slow2) Young talk fast

Page 16: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

16

Communication: Gender

Is gender communication biased? Do female talk more among themselves? Do male-female conversations took longer?

Findings: No of. conversations is not biased (follows chance) Cross gender conversations take longer and are mo

intense (people put more attention)

M F49%21%20%

Conversations

M F5mi n4.5 min4min

Duration

M F7.66.65.9

Messages/conversation

Page 17: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

17

Communication: Geo distance

Longer links are used more

Page 18: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

18

Communication: Geography (1)

Each dot represents number of users at geo locationMap of the world appears!Costal regions dominate

Page 19: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

19

Communication: Geography (2)

Users per capita

Mid-USA and Scandinavia have highest per capita

Fraction of country’s population on MSN:•Iceland: 35%•Spain: 28%•Netherlands, Canada, Sweden, Norway: 26%•France, UK: 18%•USA, Brazil: 8%

Page 20: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

20

World communication axis

Northern hemisphere dominates communication

For each conversation between geo points (A,B) we increase the intensity on the line between A and B

Page 21: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

Messaging as a Network

21At least 1 message exchanged

Page 22: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

22

IM Communication Network Buddy graph (estimate)

240 million people (people that login in June ’06) 9.1 billion buddy edges (friendship links)

Communication graph Edge if the users exchanged at least 1 message 180 million people 1.3 billion edges 30 billion conversations

Page 23: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

23

Network: Number of buddies

Number of buddies follows power-law with exponential

cutoff distribution

Limit of 600 buddies

Page 24: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

24

Network: Small world

Milgram’s small world experiment

(i.e., hops + 1)

Small-world experiment [Milgram ‘67] People send letters from Nebraska to Boston

How many steps does it take? Messenger social network of the whole planet Eart

240M people, 1.3B edgesMSN Messenger network

Number of steps

between pairs of people

Avg. path length 6.690% of the people can be reached in <

8 hops

Page 25: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

25

Network: Clustering

How many triangles are closed?

Clustering normally decays as k-1

High clustering Low clustering

Communication network is highly clustered: k-0.37

Page 26: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

26

Network: Connectivity

Page 27: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

27

Network: k-Cores decomposition

What is the structure of the core of the network?

[Batagelj & Zaveršnik, 2002]

Page 28: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

28

k-Cores: core of the network

People with k<20 are the periphery Core is composed of 79 people, each having 68

edges among them

Page 29: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

29

Network: Tie-strength

Remove nodes (in some order) and observe how network falls apart: Number of edges deleted Size of largest connected component

Order nodes by: Number of links Total conversations Total conv. Duration Messages/conversation Avg. sent, avg. duration

Page 30: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

30

Strength: Nodes vs. Edges

Page 31: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

31

Strength: Connectivity

Page 32: Jure Leskovec, CMU Eric Horwitz, Microsoft Research

32

Conclusion

Social network of the whole planet Earth The largest social network analyzed

Strong presence of homophily people that communicate are similar (except gender)

Well connected Small-world in only few hops one can research most of the

network Very robust:

many (random) people can be removed and the network is still connected