Transcript
Page 1: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Statistical properties of network community structure

Jure Leskovec, CMUKevin Lang, Anirban Dasgupta and Michael MahoneyYahoo! Research

Page 2: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Network communities

Communities: Sets of nodes with lots

of connections inside and few to outside (the rest of the network)

Assumption: Networks are

(hierarchically) composed of communities (modules)

Communities, clusters, groups,

modules

Page 3: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

How community like is a set of nodes?

Want a measure that corresponds to intuition

Conductance (normalized cut):Φ(S) = # edges cut / # edges inside

Small Φ(S) corresponds to more community-like sets of nodes

Page 4: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Page 5: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Bad communit

yΦ=5/7 = 0.7

Page 6: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Page 7: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Best communit

yΦ=2/8 = 0.25

Page 8: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Network Community Profile Plot We define:

Network community profile (NCP) plotPlot the score of best community of size k

Search over all subsets of size k and find best: Φ(k=5) = 0.25

NCP plot is intractable to compute Use approximation algorithm

Page 9: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

NCP plot: Small Social Network

Dolphin social network Two communities of dolphins

NCP plotNetwork

Page 10: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

NCP plot: Zachary’s karate club

Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B

NCP plotNetwork

Page 11: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

NCP plot: Network Science

Collaborations between scientists in Networks

NCP plotNetwork

Page 12: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Geometric and Hierarchical graphs

Hierarchical network

Geometric (grid-like) network – Small social

networks– Geometric and– Hierarchical networkhave downward NCP plot

Page 13: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Our work: Large networks

Previously researchers examined community structure of small networks (~100 nodes)

We examined more than 70 different large social and information networks

Large real-world networks look

completely different!

Page 14: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Example of our findings

Typical example:General relativity collaboration network (4,158 nodes, 13,422 edges)

Page 15: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

NCP: LiveJournal (N=5M, E=42M)

Better and better

communities

Worse and worse

communities

Best community

has 100 nodes

Com

mu

nit

y s

core

Community size

Page 16: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Explanation: Downward part

Whiskers are responsible for downward slope of NCP plot

Whisker is a set of nodes connected to the network by a single

edge

NCP plot

Largest whisker

Page 17: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Explanation: Upward part

Each new edge inside the community costs more

NCP plot

Φ=2/1 = 2

Φ=8/3 = 2.6

Φ=64/11 = 5.8

Each node has twice as many

children

Page 18: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Suggested Network Structure

Network structure: Core-

periphery, jellyfish, octopus

Whiskers are responsible for

good communities

Denser and denser core

of the network

Core contains 60%

node and 80% edges

Page 19: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Caveat: Bag of whiskers

What if we allow cuts that give disconnected communities?

Cut all whiskers and compose

communities out of them

Page 20: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Caveat: Bag of whiskersC

om

mu

nit

y s

core

Community size

We get better community scores when

composing disconnected sets of

whiskers

Connected communitie

sBag of whiskers

Page 21: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Comparison to a rewired network

Rewired network: random

network with same degree distribution

Page 22: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

What is a good model?

What is a good model that explains such network structure?

None of the existing models work

Pref. attachment Small World Geometric Pref. Attachment

FlatDown and Flat

Flat and Down

Page 23: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Forest Fire model works

Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively

As community grows it

blends into the core of

the network

Page 24: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Forest Fire NCP plot

rewired

network

Bag of whisk

ers

Page 25: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Conclusion and connections Whiskers:

Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social

relationship to 150 people Bond vs. identity communites

Core: Newman et al. analyzed 400k node product network▪ Largest community has 50% nodes▪ Community was labeled “miscelaneous”

Page 26: Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Conclusion and connections

NCP plot is a way to analyze network community structure

Our results agree with previous work on small networks

But large networks are fundamentally different

Large networks have core-periphery structure Small well isolated communities blend into the

core of the networks as they grow


Top Related