jure leskovec, cmu kevin lang, anirban dasgupta, michael mahoney yahoo! research

Post on 17-Dec-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Properties of network community structure

Jure Leskovec, CMUKevin Lang, Anirban Dasgupta, Michael MahoneyYahoo! Research

2

Networks

Big data Study emerging behaviors How are small networks different from large

3

Network community structure

Communities (groups, clusters, modules): Sets of nodes with lots

of connections inside and few to outside (the rest of the network)

Communities, clusters, groups,

modules

4

Example 1: Biology

Nodes represent proteins Edges represent

interactions/associations

Proteins with same function interact more

Can use network to discover functional groups

Yeast transcriptional regulatory modules [Bar-Joseph et al., 2003]

5

Example 2: Social networks

Clusters correspond to social communities, organizational units (e.g., departments)

Zachary’s Karate club network• During the study the club split into 2• The split corresponds to min-cut (● vs. ■)

6

Example 3: Web (blogs)

[Adamic-Glance 2005]Democrat vs. Republican blogs

7

Example 4: Science

Citations Collaborations

[Newman 2003]

8

Example 5: Hierarchies

Nested communities: modular structure of networks is hierarchically organized

CS Math Drama Music

Science Arts

University

9

Example 6: Hierarchies

Recursive hierarchical network

(a) N=5, E=8

(b) N=25, E=56(c) N=125, E=344

10

Discovering communities Intuition: Find nodes that can be easily

separated from the rest of the network Various objective functions

Min-cut Normalized-cut Centrality, Modularity

Various algorithms Spectral clustering (random walks) Girvan-Newman (centrality) Metis (contraction based)

Girvan-Newman: 1) Betweenness centrality: number of shortest paths passing through an edge. 2) Remove edges by decreasing centrality

11

Our question: How well are communities expressed?

12

Our question: How well are communities expressed?

Statistical properties of community structure Instead of searching for communities we measure

well how expressed are communities

Questions What is the community structure of real world

networks? How to measure and quantify this? What does this tell us about network structure? What is a good model (intuition)? What are consequences for clustering/partitioning

algorithms?

13

Community score (quality)

How community like is a set of nodes?

Need a natural intuitive measure

Conductance (normalized cut)Φ(S) = # edges cut / # edges inside

Small Φ(S) corresponds to more community-like sets of nodes

S

S’

14

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

What is “best”

community of 5 nodes?

15

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Bad communit

yΦ=5/6 = 0.83

What is “best”

community of 5 nodes?

16

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

What is “best”

community of 5 nodes?

17

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Best communit

yΦ=2/8 = 0.25

What is “best”

community of 5 nodes?

18

Network Community Profile Plot We define:

Network community profile (NCP) plotPlot the score of best community of size k

Search over all subsets of size k and find best: Φ(k=5) = 0.25

NCP plot is intractable to compute

19

Network Community Profile Plot We define:

Network community profile (NCP) plotPlot the score of best community of size k

Community size, log k

log Φ(k)k=5, Φ(k)=0.25 k=7,

Φ(k)=0.18

20

NCP: Example

Community size, log k

Com

munit

y s

core

, lo

g Φ

(k)

21

Scaling to large networks Local spectral clustering

algorithm Pick a seed node Slowly diffuse mass around

it (via PageRank like random walk)

Find the bottleneck

Repeat many times Many seed nodes for very

local walks Less seed nodes for more

global (longer) walks

22

How are NCP plots of real networks?

23

NCP plot: Small Social Network

Dolphin social network Two communities of dolphins

NCP plotNetwork

24

NCP plot: Zachary’s karate club

Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B

NCP plotNetwork

25

NCP plot: Network Science

Collaborations between scientists in Networks

NCP plotNetwork

26

NCP plot: Grids

NCP plotNetwork

27

NCP plot: Graphs with geometry

NCP plotNetwork

28

NCP plot: Manifold dataset

Manifold learning dataset (Hands)

NCP plotNetwork

29

NCP plot: Power Grid

Eastern US power grid:

30

NCP plot: Hierarchical network

NCP plotNetwork

– Small social networks– Geometric and– Hierarchical networkhave downward NCP plot

What about large

networks?

31

NCP of Large Networks

32

Our work: Large networks

Previously researchers examined community structure of small networks (~100 nodes)

We examined more than 70 different large networks

Large real-world networks look very

different!

33

Example of our findings

Typical example:General relativity collaboration network (4,158 nodes, 13,422 edges)

34

Com

mu

nit

y s

core

Community size

NCP: LiveJournal (N=5M, E=42M)

Better and better

communities

Best communities get worse and worse

Best community

has 100 nodes

35

Explanation: Downward part

Whiskers are responsible for downward slope of NCP plot

Whisker is a set of nodes connected to the network by a single

edge

NCP plot

Largest whisker

36

Explanation: Upward part

Each new edge inside the community costs more

NCP plot

Φ=2/4 = 0.5

Φ=8/6 = 1.3

Φ=64/14 = 4.5

Each node has twice as many

children

Φ=1/3 = 0.33

37

Comparison to rewired network

Take a real network G Rewire edges for a long time We obtain a random graph with same degree

distribution as the real network G

38

Comparison to a rewired network

Rewired network: random

network with same degree distribution

39

LiveJournal whisker sizes

Whiskers in real networks are larger than expected

40

Whisker shapes

Whiskers in real networks are non-trivial (richer than

trees)

Edge to cut

41

Caveat: Bag of whiskers

What if we allow cuts that give disconnected communities?

• Cut all whiskers • Compose communities out of whiskers• How good “communities” do we get?

42

Communities made of whiskers

Com

mu

nit

y s

core

Community size

We get better community scores when

composing disconnected sets of

whiskers

Connected communitie

sBag of whiskers

43

What if I remove whiskers?

Nothing happens!Now we have 2-edge

connected whiskers to deal with.

44

Conclusion: Core+Whiskers

Connected communitie

sBag of whiskers

Rewired network

45

Conclusion: Network structure

Network structure: Core-

periphery (jellyfish, octopus)

Whiskers are responsible for

good communities

Denser and denser

core of the network

Core contains 60%

node and 80% edges

46

What is a good model? Intuiton?

47

(Sparse) Random graphs

(Sparse) Random graph: Start with N nodes Pick pairs of nodes uniformly at random and connect

Flat (long random connections)

Theorem (works for any degree distribution)

Sparsity does not explain our observation

48

Preferential attachment

Preferential attachment [Price 1965, Albert & Barabasi 1999]: Add a new node, create m out-links Probability of linking a node ki is

proportional to its degree Based on Herbert Simon’s result

Power-laws arise from “Rich get richer” (cumulative advantage)

Flat (connections to hubs – no locality)

49

Small-World model

Let’s exploit local connections

Down (locally network looks like a mesh)and Flat (at large scale network looks random)

50

Geometry + Preferential attachment

Geometric preferential attachment: Place nodes at random in 2D Pick a node Pick nodes in a radius Connect preferentially

Flat (locally network is random)and Down (globally network is a mesh – union of local expanders)

51

Model that works: Forest Fire Forest Fire: connections spread like a fire

New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively

As community grows it

blends into the core of

the network

52

Forest Fire NCP plot

rewired

network

Bag of whisk

ers

53

Conclusion and connections Whiskers:

Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social

relationship to at most 150 people Core:

Core has little structure (hard to cut) Still more structure than the random network

54

Connections to previous work

Other researchers examined small networks so they did not hit the Dunbar’s limit

Small evidence: 400k nodes Amazon co-purchasing network

[Clauset et al. 2004]▪ Largest community has 50% of all nodes▪ It was labeled “Miscelaneous”

Karate club has no significant community structure [Newman et al. 2007]

55

Other explanantions

Bond vs. identity communities Multiple hierarchies that blur the community

boundaries

56

Is there still hope?

Ground truth Yes, use attributes, better link semantics

57

Conclusion and connections

NCP plot is a way to analyze network community structure

Our results agree with previous work on small networks (that are commonly used for testing community finding algorithms)

But large networks are different Large networks

Whiskers + Core structure Small well isolated communities blend into the

core of the networks as they grow

top related