resolution-free community detection

14
Community Detection Resolution Limit Definition of resolution-free Results Resolution-free community detection V.A. Traag 1 , P. Van Dooren 1 , Y.E. Nesterov 2 1 ICTEAM Universit´ e Catholique de Louvain 2 CORE Universit´ e Catholique de Louvain 8 April 2011

Upload: vincent-traag

Post on 19-Jun-2015

113 views

Category:

Science


0 download

DESCRIPTION

Presentation at NetSci 2011, Budapest, April 8, 2011.

TRANSCRIPT

Page 1: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Resolution-free community detection

V.A. Traag1, P. Van Dooren1, Y.E. Nesterov2

1ICTEAMUniversite Catholique de Louvain

2COREUniversite Catholique de Louvain

8 April 2011

Page 2: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Outline

1 Community Detection

2 Resolution Limit

3 Definition of resolution-free

4 Results

Page 3: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Community Detection

• Detect ‘natural’ communities in network.• Modularity approach: ‘relatively’ many links inside communities

Page 4: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Community Detection (formal)

• In general, commmunities should have relativelyI many present links (benefit),I few missing links (cost)

Minimize H = −∑ij

(aijAij − bij(1− Aij))δ(σi , σj),

• Compare to random null-model pij (RB)

aij = wij − bij and bij = γRBpij

HRB = −∑ij

(Aijwij − γRBpij)δ(σi , σj).

• Modularity (NG): use configuration null model

pij =kikj2m

.

I Reichardt and Bornholdt. Phys Rev E (2006) 74:1,016110

I Newman and Girvan. Phys Rev E (2004) 69:2,026113

Page 5: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Resolution limit

• Modularity might miss ‘small’ communities.

• Merge two cliques in ring of cliques when

γRB <q

nc(nc − 1) + 2.

• Depends on the total size of the graph.

• Number of communities scales as√γRBm.

• For general null model, problem remainssince

∑ij pij = 2m.

I Fortunato and Barthlemy PNAS (2007) 104:1, pp. 36

I Kumpala et al. Eur Phys J B (2007) 56, pp. 41-45

Page 6: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Evading the resolution limit

• New model (RN) suggested

aij = wij

bij = γRN

HRN = −∑ij

(Aij(wij + γRN)− γRN)δ(σi , σj).

• Claim: no resolution limit, as merge depends only on ‘local’variables

γRN <1

n2c − 1

.

• But, take pij = kikj (rescale γRB by 2m), we obtain

γRB <1

2(nc(nc − 1) + 2)2,

also only ‘local’ variables. Hence, also no resolution limit?

I Ronhovde and Nussinov. Phys Rev E (2010) 81:4,046114.

Page 7: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Problems remain

Subgraph

• Assume pij = kikj (rescale γRB by 2m)

• Then separate in large graph when γRB >1

2(nc(nc − 1) + 2)2

• But merged in subgraph when γRB <1

2(nc(nc − 1) + 1)2

Page 8: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Resolution limit revisited

Resolution-limit

Resolution-free

• Problem is not merging per se.

• Rather, cliques separate in subgraph, but merge in large graph(or vice versa).

• Suggests following definition.

Page 9: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Resolution limit revisited

Resolution-limit

Resolution-free

Definition (Resolution-free)

Objective function H is called resolution-free if, whenever partitionC optimal for G , then subpartition D ⊂ C also optimal forsubgraph H(D) ⊂ G induced by D.

Page 10: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Defining resolution-free

Definition (Resolution-free)

Objective function H is called resolution-free if, whenever partitionC optimal for G , then subpartition D ⊂ C also optimal forsubgraph H(D) ⊂ G induced by D.

• Implicitly defines resolution limit: method is not resolution-free.

• Some nice properties of resolution-free methods:I Replace optimal subpartitionsI Never split cliques (unless in single nodes)

Main questions

• Do such methods exist?

• What conditions to impose?

Page 11: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

General framework

General community detection

H = −∑ij

(aijAij − bij(1− Aij))δ(σi , σj),

RB model Set aij = wij − bij , bij = γRBpij .

RN model Set aij = wij , bij = γRN .

Simpler alternative

CPM Set aij = wij − bij and bij = γ. Leads to

H = −∑ij

(Aijwij − γ)δ(σi , σj).

Clear interpretation: γ is minimum density of a community

H = −∑c

ec − γn2c .

Page 12: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Main result

Do resolution-free methods exists?Yes: Both RN and CPM are resolution-free, results from generaltheorem.

What conditions to impose?Sufficient condition: aij and bij should be ‘local’.

Definition (Local weights)

Weights aij , bij called local whenever for every subgraph H ⊂ G ,weights remain similar, i.e. aij(G ) ∼ aij(H) and bij(G ) ∼ bij(H).

• Implies local weigths aij and bij can only depend on node i andnode j , nothing further.

• RN and CPM use local weights, hence resolution-free.

• Not necessary condition, but seem to be few exceptions.

• So, RN and CPM (almost) only sensible definitions.

Page 13: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Performance (directed networks)

µ0 0.2 0.4 0.6 0.8 1.0

NMI

0.25

0.5

0.75

1

CPM Infomap Modularity ER

n = 103

n = 104

Page 14: Resolution-free community detection

Community Detection Resolution Limit Definition of resolution-free Results

Conclusions

• Provided definition of resolution-free.

• Methods using local weights are resolution-free.

• Clarifies link between ‘local’ methods and resolution limit.

• Only few resolution-free methods.

• Tested CPM, performs superbly.

Thank you for your attention.

Questions?

I Traag, Van Dooren and Nesterov arXiv:1104.3083v1