jure leskovec, cmu kevin lang, anirban dasgupta and michael mahoney yahoo! research

26
Statistical properties of network community structure Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Upload: amelia-wordell

Post on 16-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Statistical properties of network community structure

Jure Leskovec, CMUKevin Lang, Anirban Dasgupta and Michael MahoneyYahoo! Research

Page 2: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Network communities

Communities: Sets of nodes with lots

of connections inside and few to outside (the rest of the network)

Assumption: Networks are

(hierarchically) composed of communities (modules)

Communities, clusters, groups,

modules

Page 3: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

How community like is a set of nodes?

Want a measure that corresponds to intuition

Conductance (normalized cut):Φ(S) = # edges cut / # edges inside

Small Φ(S) corresponds to more community-like sets of nodes

Page 4: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Page 5: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Bad communit

yΦ=5/7 = 0.7

Page 6: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Page 7: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Best communit

yΦ=2/8 = 0.25

Page 8: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Network Community Profile Plot We define:

Network community profile (NCP) plotPlot the score of best community of size k

Search over all subsets of size k and find best: Φ(k=5) = 0.25

NCP plot is intractable to compute Use approximation algorithm

Page 9: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

NCP plot: Small Social Network

Dolphin social network Two communities of dolphins

NCP plotNetwork

Page 10: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

NCP plot: Zachary’s karate club

Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B

NCP plotNetwork

Page 11: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

NCP plot: Network Science

Collaborations between scientists in Networks

NCP plotNetwork

Page 12: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Geometric and Hierarchical graphs

Hierarchical network

Geometric (grid-like) network – Small social

networks– Geometric and– Hierarchical networkhave downward NCP plot

Page 13: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Our work: Large networks

Previously researchers examined community structure of small networks (~100 nodes)

We examined more than 70 different large social and information networks

Large real-world networks look

completely different!

Page 14: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Example of our findings

Typical example:General relativity collaboration network (4,158 nodes, 13,422 edges)

Page 15: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

NCP: LiveJournal (N=5M, E=42M)

Better and better

communities

Worse and worse

communities

Best community

has 100 nodes

Com

mu

nit

y s

core

Community size

Page 16: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Explanation: Downward part

Whiskers are responsible for downward slope of NCP plot

Whisker is a set of nodes connected to the network by a single

edge

NCP plot

Largest whisker

Page 17: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Explanation: Upward part

Each new edge inside the community costs more

NCP plot

Φ=2/1 = 2

Φ=8/3 = 2.6

Φ=64/11 = 5.8

Each node has twice as many

children

Page 18: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Suggested Network Structure

Network structure: Core-

periphery, jellyfish, octopus

Whiskers are responsible for

good communities

Denser and denser core

of the network

Core contains 60%

node and 80% edges

Page 19: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Caveat: Bag of whiskers

What if we allow cuts that give disconnected communities?

Cut all whiskers and compose

communities out of them

Page 20: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Caveat: Bag of whiskersC

om

mu

nit

y s

core

Community size

We get better community scores when

composing disconnected sets of

whiskers

Connected communitie

sBag of whiskers

Page 21: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Comparison to a rewired network

Rewired network: random

network with same degree distribution

Page 22: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

What is a good model?

What is a good model that explains such network structure?

None of the existing models work

Pref. attachment Small World Geometric Pref. Attachment

FlatDown and Flat

Flat and Down

Page 23: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Forest Fire model works

Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively

As community grows it

blends into the core of

the network

Page 24: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Forest Fire NCP plot

rewired

network

Bag of whisk

ers

Page 25: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Conclusion and connections Whiskers:

Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social

relationship to 150 people Bond vs. identity communites

Core: Newman et al. analyzed 400k node product network▪ Largest community has 50% nodes▪ Community was labeled “miscelaneous”

Page 26: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Conclusion and connections

NCP plot is a way to analyze network community structure

Our results agree with previous work on small networks

But large networks are fundamentally different

Large networks have core-periphery structure Small well isolated communities blend into the

core of the networks as they grow