limits of community detection
DESCRIPTION
Presentation at seminar Université catholique de Louvain, March 20, 2012TRANSCRIPT
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Limits of community detection
V.A. Traag1, P. Van Dooren1, Y.E. Nesterov2
1ICTEAMUniversite Catholique de Louvain
2COREUniversite Catholique de Louvain
20 March 2012
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Outline
1 Introduction
2 Modularity
3 Problem 1: Incomparability
4 Problem 2: Resolution-limit
5 Resolution-limit-free
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Outline
1 Introduction
2 Modularity
3 Problem 1: Incomparability
4 Problem 2: Resolution-limit
5 Resolution-limit-free
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Understanding Networks
Facebook (2011)
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Understanding Networks
Facebook (2011)
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Understanding Networks
Blogosphere Presidential Election (2004)
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Understanding Networks
Aad Kosto
Ab Klink
Abraham Kuyper
Abu Nuwas
Ad Melkert
Adri Duivesteijn
AEL
Afshin Ellian Ahmed Aboutaleb
Ahmet Daskapan
Aisha
AIVD
Ali B.
Ali Lazrak
Ali Madanipour
Al-Qaeda
Amin Saikal
Amitai Etzioni
Anil Ramdas
Anita Bocker
Annelies Verstand
Anne-Ruth Wertheim
Anton Zijderveld
Arie Slob
Arie van der Zwan
Arie van Deursen
Aristoteles
Averroes
Avicenna
Avishai Margalit
Ayaan Hirsi AliAyatolah Khomeini
Bart Jan Spruyt
Bart Tromp
Bas Heijne
Bas van Stokkom
Bernard Lewis
Bert Middel
Bertrand Russell
Bert Wagendorp
Betsy Udink
Osama Bin Laden
Britta Bohler
Camiel Eurlings
CDA
CIDI
Claude Lefort
Claude Steele
Commissie Blok
Commissie-Stasi
Commissie van Montfrans
CPBD66
Daniel Garrison Brinton
David Brooks
David Lilienthal
David Pryce-Jones
De Volkskrant
Dick Cheney
Dick de Ruijter
Dick Pels
Dick van Eijk
Dietrich Thranhardt
Driss el Boujoufi
Dyab Abou Jahjah
Ebru Umar
Ed Leuw
Edmund Burke
Edward Said
Een Ander Joods Geluid
Ella Kalsbeek
Elsbeth Etty
Els Borst
Erasmus
Erica Terpstra
Erik Snel
Ernest Renan
Fadima Orgu
Fatima Elatik
Fatma Katirci
Federatie Nederlandse Zionisten
Femke Halsema
FNV
Fokke Obbema
Forum
Freek de Jonge
Frits Bolkestein
Frits van veen
Frits WesterFuad Hussein
Gabriel van den Brink
Geert Mak
Geert Wilders
George W. Bush
Gerard de Vries
Gerard Smink
Gerard Spong
Gerrit Zalm
Gijs Weenink
Gilles Kepel
Groen Links
Haci Karacaer
Hafid Bouazza
Halim El MadkouriHan Entzinger
Hans Boutellier
Hans Dijkstal
Hans Janmaat
Hans Siebers
Hans Visser
Harry van Doorn
Hendrik Colijn
Henk Doll
Herbert Gans
Herman Philipse
Herman Vuijsje
Hilbrand Nawijn
H.J.A. Hofland
Hoge Commissariaat voor Vluchtelingen
Hugo Brandt CorstiusHuman Development Report
Ian Buruma
Ibn Hazam
Ibn Warraq
Ilhan Akel
imam El-Moumni
Immanuel Kant
Irshad Manji
Ivo Opstelten
Jacob Kohnstamm
Jaco Dagevos
Jacqueline Costa-Lascoux
Jacqueline Draaijer
Jacques Wallage
James C. Kennedy
Jan Beerenhout
Jan Blokker
Jan BrugmanJan Drentje
Jan Jacob Slauerhoff
Jan-Peter Balkendende
Jan Pronk
Jan Rijpstra
Jan Schaefer
Jantine Nipius
Jantine Oldersma
Jan Willem Duyvendak
Jean Tillie
Jefferson
Jelle van der Meer
Jerome Heldring
Job Cohen
Johan Norberg
Johan Remkes
John Coetzee
John Gray
John Jansen van galen
John Leerdam
John Mollenkopf
John Stuart Mill
Jola Jakson
Joodse Gemeente Amsterdam
Joods Journaal
Joost EerdmansJoost Niemoller
Joost Zwagerman
Jorg Haider
Jos de Beus
Jos de Mul
Jozias van Aartsen
Justus Veenman
Jytte Klausen
Kader Abdollah
Karel van het Reve
Karen Adelmund
Kees van der Staaij
Kees van Kooten
Kees Vendrik
Khalifa
KMAN
Kohnstamm Instituut
Kohnstamm-rapport
Laurent ChambonLeefbaar Rotterdam
Leo Lucassen
Likud Nederland
LPF
Maarten Huygen
Malcolm X
Mansour Khalid
Marc de Kessel
Marcel van Dam
Marco Borsato
Marco PastorsMaria van der Hoeven
Marijke VosMarion van San
Mark Bovens
Marlite Halbertsma
Martien Kromwijk
Martin Luther
Maududi
Maurits Berger
Maxime Verhagen
Meindert Fennema
Menno Hurenkamp
Micha de Winter
Michele de Waard
Michele Tribalat
Milli Gorus
Mimount Bousakla
Mirjam de Rijk
Mohammed Arkoun
Mohammed Benzakour
Mohammed Cheppih
Mohammed Iqbal
Mohammed Reza Pahlavi
Mohammed Sini
Naema Tahir
Naima Elmaslouhi
Nazih Ayubi
Nazmiye Oral
NCB
Neal Ascherson
Nebahat Albayrak
Nico de Haas
Nico van Nimwegen
Nieuw Israelitisch Weekblad
Norman Podhoretz
Nout Wellink
NRC Handelsblad
Olivier RoyOrhan Pamuk
Oscar Garschagen
Oscar Hammerstein
Osdorp Posse
Paul Cliteur
Paul de Beer
Paul Gerbrands
Pauline Meurs
Paul Meurs
Paul Rosenmuller
Paul Scheffer
Paul Schnabel
Peter Langendam
Peter Smit
Pierre Bourdieu
Pierre Heijnen
Pieter Lakeman
Piet Hein Donner
Pim Fortuyn
PPRPresident Eisenhower
Prinses Maxima
Profeet MohammedProfessor Peters
PvdA
Qotb
Ramsey Nasr
Renate Rubinstein
Rene Cuperus
RIAGG
Ria van Gils
Rita Verdonk
Rob Oudkerk
Roger van Boxtel
Ronald van Raak
Rudy Kousbroek
Ruud Koopmans
Ruud Lubbers
Saddam Hussein
Said Benayad
Salman Rushdie
Samuel Huntington
Schelto Patijn
SCP
Senay Ozdemir
Shervin Nekuee
Shukri
Silvio Berlusconi
Simone van der Burg
SISWO
Sjaak van der Tak
Sjeik Ahmed Yassin
Sjoerd de Jong
Soedish Verhoeven
SP
Spinoza
Stef BlokSusan Moller Okin
S.W. Couwenberg
Sylvain Ephimenco
Tara Singh Varma
Theodor AdornoTheo van Gogh
Theo Veenkamp
Thijs WoltgensThorbecke
Timothy Garton AshTiny Cox
Tony Blair
Tzvetan Todorov
Van Montfrans
Vlaams Blok
VN Committee for elimination of all forms of racism
Volkert van der Graaf
VVD
VVN
Wasif Shadid
Werner Schiffauer
Wiardi Beckman Stichting
Willem Vermeend
Wim de Bie
Wim Kok
Wim Willems
Wouter bos
WRR
WRR 1989
WRR 2001
WRR 2003
Yucel Yesilgoz
Zekeriya Gumus
Zeki Arslan
Dutch Debate on Integration (2002-04),
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Community Detection
• Detect ‘natural’ communities in networks
• Basic idea: ‘relatively’ many links inside communities
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Outline
1 Introduction
2 Modularity
3 Problem 1: Incomparability
4 Problem 2: Resolution-limit
5 Resolution-limit-free
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
First principle approach
• Reward “good” links, penalize “bad” links.
• Assume contribution for links within and between are equal.
• Simplifies to only internal links.
Present Missing
Within + −
Between − +
Link present or missing?
Link within orbetween community?
Reward (+) orPenalize (−)
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
First principle approach
• Reward “good” links, penalize “bad” links.
• Assume contribution for links within and between are equal.
• Simplifies to only internal links.
Aij = 1 Aij = 0
δij = 1 aij −bij
δij = 0 −cij dij
Link present or missing?
Link within orbetween community?
Reward (+) orPenalize (−)
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
First principle approach
• Reward “good” links, penalize “bad” links.
• Assume contribution for links within and between are equal.
• Simplifies to only internal links.
Aij = 1 Aij = 0
δij = 1 aij −bij
δij = 0 −aij bij
Link present or missing?
Link within orbetween community?
Reward (+) orPenalize (−)
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
First principle approach
• Reward “good” links, penalize “bad” links.
• Assume contribution for links within and between are equal.
• Simplifies to only internal links.
Objective function
H = −∑ij
(aijAij−bij(1− Aij)) δij
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Null-model
Objective function
H = −∑ij
(aijAij−bij(1− Aij)) δij
• Introduce weights aij = 1− γRBpij and bij = γRBpij .
• Null-model pij , constrained by∑
ij pij = 2m.
• Parameter γ known as resolution parameter:I Higher γ ⇒ smaller communities.I Lower γ ⇒ larger communities.
• Leads to HRB = −∑ij
(Aij − γpij) δij .
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Configuration Null-model
Reichard-Bornholdt objective function
HRB = −∑ij
(Aij − γpij) δij
• Rewire links, but keep degrees unchanged
• Probability link between i and j is then pij =kikj2m .
i jki kj
γ = 1 leads to modularity Q =1
2m
∑ij
(Aij −
kikj2m
)δij ∼ −H.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Configuration Null-model
Reichard-Bornholdt objective function
HRB = −∑ij
(Aij − γpij) δij
• Rewire links, but keep degrees unchanged
• Probability link between i and j is then pij =kikj2m .
i jki kj
γ = 1 leads to modularity Q =1
2m
∑ij
(Aij −
kikj2m
)δij ∼ −H.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Basic properties
Modularity
Q =1
2m
∑ij
(Aij −
kikj2m
)δij
• Normalized: between −1 (bad partition) and 1 (good partition).
• Trivial partitions have modularity Q = 0.
• No community consists of a single node.
• Each community is internally connected.
• Modularity maximization is NP-complete.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Outline
1 Introduction
2 Modularity
3 Problem 1: Incomparability
4 Problem 2: Resolution-limit
5 Resolution-limit-free
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Examples: grid
Modularity tends to 1.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Examples: tree
Modularity tends to 1.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Examples: random graph
Random partition has Q ∼ 0, but best partition has Q ∼ 〈√k〉
〈k〉 > 0.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Problem 1: Incomparability
Modularity can be high even“without communities”
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Compare modularity scores
(Dis)agreement on whether smoking is cancerous.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Scientific specialization
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Scientific specialization
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Scientific specialization
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Outline
1 Introduction
2 Modularity
3 Problem 1: Incomparability
4 Problem 2: Resolution-limit
5 Resolution-limit-free
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Example: ring of cliques
Modularity might merge cliques
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Example: ring of rings
Modularity might split rings
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Example: Cliques of different size
No longer fix γ = 1: tune γ so that it finds “correct” partition.
No γ yields “correct” partition.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Example: broad community sizes
Competing constraints:
• Lower γ ⇒ merge smaller cliques.
• Higher γ ⇒ split large cliques.
No γ yields “correct” partition.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Example: broad community sizes
Competing constraints:
• Lower γ ⇒ merge smaller cliques.
• Higher γ ⇒ split large cliques.
No γ yields “correct” partition.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Example: broad community sizes
Competing constraints:
• Lower γ ⇒ merge smaller cliques.
• Higher γ ⇒ split large cliques.
No γ yields “correct” partition.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Problem 2: Resolution-limit
Size of communities mightdepend on size of network
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Outline
1 Introduction
2 Modularity
3 Problem 1: Incomparability
4 Problem 2: Resolution-limit
5 Resolution-limit-free
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Resolution limit revisited
Resolution-limit
Resolution-limit-free
• Problem is not merging per se.
• Rather, cliques separate in subgraph, but merge in large graph(or vice versa).
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Resolution limit revisited
Resolution-limit
Resolution-limit-free
Definition (Resolution-limit-free)
Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Resolution-limit methods
General community detection
H = −∑ij
(aijAij−bij(1− Aij)) δij
Not resolution-limit-free
RB model Set aij = 1− γpij , bij = γRBpij .
Modularity Set pij =kikj2m and γ = 1
What weights aij and bij to choose so that model isresolution-limit-free?
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Resolution-limit-free methods
General community detection
H = −∑ij
(aijAij−bij(1− Aij)) δij
Resolution-limit-free
RN model Set aij = 1, bij = γRN .
CPM Set aij = 1− γ and bij = γ. Leads to
H = −∑ij
(Aij − γ)δij .
Are there any other weights aij and bij?
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Local weights
General community detection
H = −∑ij
(aijAij−bij(1− Aij)) δij
i
j
Definition (Local weights)
Weights local when they only depend on subgraph induced by iand j .
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Local weights
General community detection
H = −∑ij
(aijAij−bij(1− Aij)) δij
i
j
Theorem (Local weights ⇒ resolution-limit-free)
Method is resolution-limit-free if weights are local.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Local weights
General community detection
H = −∑ij
(aijAij−bij(1− Aij)) δij
i
j
Are local weights necessary?
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
“Almost” resolution-free ⇒ local weights
α1
α2α3
β1
β2
β3
β4
Not resolution free, whenever merged in large graph,but separate in subgraph, or the other way around. So
Hm < Hs and H′m > H′
s , or
Hm > Hs and H′m < H′
s
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
“Almost” resolution-free ⇒ local weights
α1
α2α3
β1
β2
β3
β4
Working out, yields
α′3(β4(nc − 1) + 2β3) < α3(β′4(nc − 1) + 2β′3) or,
α′3(β4(nc − 1) + 2β3) > α3(β′4(nc − 1) + 2β′3).
Always satisfied for non-local weights, except when
(nc − 1) = 2β′3α3 − β3α
′3
β4α′3 − β′4α3
.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Louvain like algorithm
1
1
11
1
1
1
11
1
57 5 62
1 Init node sizes ni = 1
2 Loop over all nodes (randomly), calculate improvement
∆H(σi = c) = (ei↔c − 2γni∑j
njδ(σj , c)),
3 Create community graph, repeat procedure
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Louvain like algorithm
1
1
11
1
1
1
11
1
57 5 62
1 Init node sizes ni = 1
2 Loop over all nodes (randomly), calculate improvement
∆H(σi = c) = (ei↔c − 2γni∑j
njδ(σj , c)),
3 Create community graph, repeat procedure
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Louvain like algorithm
1
1
11
1
1
1
11
1
57 5 62
1 Init node sizes ni = 1
2 Loop over all nodes (randomly), calculate improvement
∆H(σi = c) = (ei↔c − 2γni∑j
njδ(σj , c)),
3 Create community graph, repeat procedure
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Louvain like algorithm
1
1
11
1
1
1
11
1
57 5 62
1 Init node sizes ni = 1
2 Loop over all nodes (randomly), calculate improvement
∆H(σi = c) = (ei↔c − 2γni∑j
njδ(σj , c)),
3 Create community graph, repeat procedure
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Louvain like algorithm
1
1
11
1
1
1
11
1
57 5 62
1 Init node sizes ni = 1
2 Loop over all nodes (randomly), calculate improvement
∆H(σi = c) = (ei↔c − 2γni∑j
njδ(σj , c)),
3 Create community graph, repeat procedure
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Performance (directed networks)
µ0 0.2 0.4 0.6 0.8 1.0
NMI
0.25
0.5
0.75
1
CPMγ=γ∗
ERγ=p
RB ConfγRB=γ∗
RB
Mod.γRB=1
Inf.
n = 103
n = 104
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Problems of CPM
No problems:
• Doesn’t merge cliques.
• Doesn’t split rings.
• Correctly detects cliques of different size.
Remaining problem:
• Communities of different density.
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free
Conclusions
• Modularity high even when no communities.
• Modularity merge/splits communities unexpectedly(resolution-limit).
• Methods using local weights are resolution-limit-free.
• Resolution-limit-free method performs superbly.
Open question:
• what when communities have different densities?
Thank you for your attention.
Questions?