resolution-free community detection
DESCRIPTION
Presentation at NetSci 2011, Budapest, April 8, 2011.TRANSCRIPT
Community Detection Resolution Limit Definition of resolution-free Results
Resolution-free community detection
V.A. Traag1, P. Van Dooren1, Y.E. Nesterov2
1ICTEAMUniversite Catholique de Louvain
2COREUniversite Catholique de Louvain
8 April 2011
Community Detection Resolution Limit Definition of resolution-free Results
Outline
1 Community Detection
2 Resolution Limit
3 Definition of resolution-free
4 Results
Community Detection Resolution Limit Definition of resolution-free Results
Community Detection
• Detect ‘natural’ communities in network.• Modularity approach: ‘relatively’ many links inside communities
Community Detection Resolution Limit Definition of resolution-free Results
Community Detection (formal)
• In general, commmunities should have relativelyI many present links (benefit),I few missing links (cost)
Minimize H = −∑ij
(aijAij − bij(1− Aij))δ(σi , σj),
• Compare to random null-model pij (RB)
aij = wij − bij and bij = γRBpij
HRB = −∑ij
(Aijwij − γRBpij)δ(σi , σj).
• Modularity (NG): use configuration null model
pij =kikj2m
.
I Reichardt and Bornholdt. Phys Rev E (2006) 74:1,016110
I Newman and Girvan. Phys Rev E (2004) 69:2,026113
Community Detection Resolution Limit Definition of resolution-free Results
Resolution limit
• Modularity might miss ‘small’ communities.
• Merge two cliques in ring of cliques when
γRB <q
nc(nc − 1) + 2.
• Depends on the total size of the graph.
• Number of communities scales as√γRBm.
• For general null model, problem remainssince
∑ij pij = 2m.
I Fortunato and Barthlemy PNAS (2007) 104:1, pp. 36
I Kumpala et al. Eur Phys J B (2007) 56, pp. 41-45
Community Detection Resolution Limit Definition of resolution-free Results
Evading the resolution limit
• New model (RN) suggested
aij = wij
bij = γRN
HRN = −∑ij
(Aij(wij + γRN)− γRN)δ(σi , σj).
• Claim: no resolution limit, as merge depends only on ‘local’variables
γRN <1
n2c − 1
.
• But, take pij = kikj (rescale γRB by 2m), we obtain
γRB <1
2(nc(nc − 1) + 2)2,
also only ‘local’ variables. Hence, also no resolution limit?
I Ronhovde and Nussinov. Phys Rev E (2010) 81:4,046114.
Community Detection Resolution Limit Definition of resolution-free Results
Problems remain
Subgraph
• Assume pij = kikj (rescale γRB by 2m)
• Then separate in large graph when γRB >1
2(nc(nc − 1) + 2)2
• But merged in subgraph when γRB <1
2(nc(nc − 1) + 1)2
Community Detection Resolution Limit Definition of resolution-free Results
Resolution limit revisited
Resolution-limit
Resolution-free
• Problem is not merging per se.
• Rather, cliques separate in subgraph, but merge in large graph(or vice versa).
• Suggests following definition.
Community Detection Resolution Limit Definition of resolution-free Results
Resolution limit revisited
Resolution-limit
Resolution-free
Definition (Resolution-free)
Objective function H is called resolution-free if, whenever partitionC optimal for G , then subpartition D ⊂ C also optimal forsubgraph H(D) ⊂ G induced by D.
Community Detection Resolution Limit Definition of resolution-free Results
Defining resolution-free
Definition (Resolution-free)
Objective function H is called resolution-free if, whenever partitionC optimal for G , then subpartition D ⊂ C also optimal forsubgraph H(D) ⊂ G induced by D.
• Implicitly defines resolution limit: method is not resolution-free.
• Some nice properties of resolution-free methods:I Replace optimal subpartitionsI Never split cliques (unless in single nodes)
Main questions
• Do such methods exist?
• What conditions to impose?
Community Detection Resolution Limit Definition of resolution-free Results
General framework
General community detection
H = −∑ij
(aijAij − bij(1− Aij))δ(σi , σj),
RB model Set aij = wij − bij , bij = γRBpij .
RN model Set aij = wij , bij = γRN .
Simpler alternative
CPM Set aij = wij − bij and bij = γ. Leads to
H = −∑ij
(Aijwij − γ)δ(σi , σj).
Clear interpretation: γ is minimum density of a community
H = −∑c
ec − γn2c .
Community Detection Resolution Limit Definition of resolution-free Results
Main result
Do resolution-free methods exists?Yes: Both RN and CPM are resolution-free, results from generaltheorem.
What conditions to impose?Sufficient condition: aij and bij should be ‘local’.
Definition (Local weights)
Weights aij , bij called local whenever for every subgraph H ⊂ G ,weights remain similar, i.e. aij(G ) ∼ aij(H) and bij(G ) ∼ bij(H).
• Implies local weigths aij and bij can only depend on node i andnode j , nothing further.
• RN and CPM use local weights, hence resolution-free.
• Not necessary condition, but seem to be few exceptions.
• So, RN and CPM (almost) only sensible definitions.
Community Detection Resolution Limit Definition of resolution-free Results
Performance (directed networks)
µ0 0.2 0.4 0.6 0.8 1.0
NMI
0.25
0.5
0.75
1
CPM Infomap Modularity ER
n = 103
n = 104
Community Detection Resolution Limit Definition of resolution-free Results
Conclusions
• Provided definition of resolution-free.
• Methods using local weights are resolution-free.
• Clarifies link between ‘local’ methods and resolution limit.
• Only few resolution-free methods.
• Tested CPM, performs superbly.
Thank you for your attention.
Questions?
I Traag, Van Dooren and Nesterov arXiv:1104.3083v1