limits of community detection

52
Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free Limits of community detection V.A. Traag 1 , P. Van Dooren 1 , Y.E. Nesterov 2 1 ICTEAM Universit´ e Catholique de Louvain 2 CORE Universit´ e Catholique de Louvain 20 March 2012

Upload: vincent-traag

Post on 05-Dec-2014

195 views

Category:

Science


2 download

DESCRIPTION

Presentation at seminar Université catholique de Louvain, March 20, 2012

TRANSCRIPT

Page 1: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Limits of community detection

V.A. Traag1, P. Van Dooren1, Y.E. Nesterov2

1ICTEAMUniversite Catholique de Louvain

2COREUniversite Catholique de Louvain

20 March 2012

Page 2: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Outline

1 Introduction

2 Modularity

3 Problem 1: Incomparability

4 Problem 2: Resolution-limit

5 Resolution-limit-free

Page 3: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Outline

1 Introduction

2 Modularity

3 Problem 1: Incomparability

4 Problem 2: Resolution-limit

5 Resolution-limit-free

Page 4: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Understanding Networks

Facebook (2011)

Page 5: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Understanding Networks

Facebook (2011)

Page 6: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Understanding Networks

Blogosphere Presidential Election (2004)

Page 7: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Understanding Networks

Aad Kosto

Ab Klink

Abraham Kuyper

Abu Nuwas

Ad Melkert

Adri Duivesteijn

AEL

Afshin Ellian Ahmed Aboutaleb

Ahmet Daskapan

Aisha

AIVD

Ali B.

Ali Lazrak

Ali Madanipour

Al-Qaeda

Amin Saikal

Amitai Etzioni

Anil Ramdas

Anita Bocker

Annelies Verstand

Anne-Ruth Wertheim

Anton Zijderveld

Arie Slob

Arie van der Zwan

Arie van Deursen

Aristoteles

Averroes

Avicenna

Avishai Margalit

Ayaan Hirsi AliAyatolah Khomeini

Bart Jan Spruyt

Bart Tromp

Bas Heijne

Bas van Stokkom

Bernard Lewis

Bert Middel

Bertrand Russell

Bert Wagendorp

Betsy Udink

Osama Bin Laden

Britta Bohler

Camiel Eurlings

CDA

CIDI

Claude Lefort

Claude Steele

Commissie Blok

Commissie-Stasi

Commissie van Montfrans

CPBD66

Daniel Garrison Brinton

David Brooks

David Lilienthal

David Pryce-Jones

De Volkskrant

Dick Cheney

Dick de Ruijter

Dick Pels

Dick van Eijk

Dietrich Thranhardt

Driss el Boujoufi

Dyab Abou Jahjah

Ebru Umar

Ed Leuw

Edmund Burke

Edward Said

Een Ander Joods Geluid

Ella Kalsbeek

Elsbeth Etty

Els Borst

Erasmus

Erica Terpstra

Erik Snel

Ernest Renan

Fadima Orgu

Fatima Elatik

Fatma Katirci

Federatie Nederlandse Zionisten

Femke Halsema

FNV

Fokke Obbema

Forum

Freek de Jonge

Frits Bolkestein

Frits van veen

Frits WesterFuad Hussein

Gabriel van den Brink

Geert Mak

Geert Wilders

George W. Bush

Gerard de Vries

Gerard Smink

Gerard Spong

Gerrit Zalm

Gijs Weenink

Gilles Kepel

Groen Links

Haci Karacaer

Hafid Bouazza

Halim El MadkouriHan Entzinger

Hans Boutellier

Hans Dijkstal

Hans Janmaat

Hans Siebers

Hans Visser

Harry van Doorn

Hendrik Colijn

Henk Doll

Herbert Gans

Herman Philipse

Herman Vuijsje

Hilbrand Nawijn

H.J.A. Hofland

Hoge Commissariaat voor Vluchtelingen

Hugo Brandt CorstiusHuman Development Report

Ian Buruma

Ibn Hazam

Ibn Warraq

Ilhan Akel

imam El-Moumni

Immanuel Kant

Irshad Manji

Ivo Opstelten

Jacob Kohnstamm

Jaco Dagevos

Jacqueline Costa-Lascoux

Jacqueline Draaijer

Jacques Wallage

James C. Kennedy

Jan Beerenhout

Jan Blokker

Jan BrugmanJan Drentje

Jan Jacob Slauerhoff

Jan-Peter Balkendende

Jan Pronk

Jan Rijpstra

Jan Schaefer

Jantine Nipius

Jantine Oldersma

Jan Willem Duyvendak

Jean Tillie

Jefferson

Jelle van der Meer

Jerome Heldring

Job Cohen

Johan Norberg

Johan Remkes

John Coetzee

John Gray

John Jansen van galen

John Leerdam

John Mollenkopf

John Stuart Mill

Jola Jakson

Joodse Gemeente Amsterdam

Joods Journaal

Joost EerdmansJoost Niemoller

Joost Zwagerman

Jorg Haider

Jos de Beus

Jos de Mul

Jozias van Aartsen

Justus Veenman

Jytte Klausen

Kader Abdollah

Karel van het Reve

Karen Adelmund

Kees van der Staaij

Kees van Kooten

Kees Vendrik

Khalifa

KMAN

Kohnstamm Instituut

Kohnstamm-rapport

Laurent ChambonLeefbaar Rotterdam

Leo Lucassen

Likud Nederland

LPF

Maarten Huygen

Malcolm X

Mansour Khalid

Marc de Kessel

Marcel van Dam

Marco Borsato

Marco PastorsMaria van der Hoeven

Marijke VosMarion van San

Mark Bovens

Marlite Halbertsma

Martien Kromwijk

Martin Luther

Maududi

Maurits Berger

Maxime Verhagen

Meindert Fennema

Menno Hurenkamp

Micha de Winter

Michele de Waard

Michele Tribalat

Milli Gorus

Mimount Bousakla

Mirjam de Rijk

Mohammed Arkoun

Mohammed Benzakour

Mohammed Cheppih

Mohammed Iqbal

Mohammed Reza Pahlavi

Mohammed Sini

Naema Tahir

Naima Elmaslouhi

Nazih Ayubi

Nazmiye Oral

NCB

Neal Ascherson

Nebahat Albayrak

Nico de Haas

Nico van Nimwegen

Nieuw Israelitisch Weekblad

Norman Podhoretz

Nout Wellink

NRC Handelsblad

Olivier RoyOrhan Pamuk

Oscar Garschagen

Oscar Hammerstein

Osdorp Posse

Paul Cliteur

Paul de Beer

Paul Gerbrands

Pauline Meurs

Paul Meurs

Paul Rosenmuller

Paul Scheffer

Paul Schnabel

Peter Langendam

Peter Smit

Pierre Bourdieu

Pierre Heijnen

Pieter Lakeman

Piet Hein Donner

Pim Fortuyn

PPRPresident Eisenhower

Prinses Maxima

Profeet MohammedProfessor Peters

PvdA

Qotb

Ramsey Nasr

Renate Rubinstein

Rene Cuperus

RIAGG

Ria van Gils

Rita Verdonk

Rob Oudkerk

Roger van Boxtel

Ronald van Raak

Rudy Kousbroek

Ruud Koopmans

Ruud Lubbers

Saddam Hussein

Said Benayad

Salman Rushdie

Samuel Huntington

Schelto Patijn

SCP

Senay Ozdemir

Shervin Nekuee

Shukri

Silvio Berlusconi

Simone van der Burg

SISWO

Sjaak van der Tak

Sjeik Ahmed Yassin

Sjoerd de Jong

Soedish Verhoeven

SP

Spinoza

Stef BlokSusan Moller Okin

S.W. Couwenberg

Sylvain Ephimenco

Tara Singh Varma

Theodor AdornoTheo van Gogh

Theo Veenkamp

Thijs WoltgensThorbecke

Timothy Garton AshTiny Cox

Tony Blair

Tzvetan Todorov

Van Montfrans

Vlaams Blok

VN Committee for elimination of all forms of racism

Volkert van der Graaf

VVD

VVN

Wasif Shadid

Werner Schiffauer

Wiardi Beckman Stichting

Willem Vermeend

Wim de Bie

Wim Kok

Wim Willems

Wouter bos

WRR

WRR 1989

WRR 2001

WRR 2003

Yucel Yesilgoz

Zekeriya Gumus

Zeki Arslan

Dutch Debate on Integration (2002-04),

Page 8: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Community Detection

• Detect ‘natural’ communities in networks

• Basic idea: ‘relatively’ many links inside communities

Page 9: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Outline

1 Introduction

2 Modularity

3 Problem 1: Incomparability

4 Problem 2: Resolution-limit

5 Resolution-limit-free

Page 10: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

First principle approach

• Reward “good” links, penalize “bad” links.

• Assume contribution for links within and between are equal.

• Simplifies to only internal links.

Present Missing

Within + −

Between − +

Link present or missing?

Link within orbetween community?

Reward (+) orPenalize (−)

Page 11: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

First principle approach

• Reward “good” links, penalize “bad” links.

• Assume contribution for links within and between are equal.

• Simplifies to only internal links.

Aij = 1 Aij = 0

δij = 1 aij −bij

δij = 0 −cij dij

Link present or missing?

Link within orbetween community?

Reward (+) orPenalize (−)

Page 12: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

First principle approach

• Reward “good” links, penalize “bad” links.

• Assume contribution for links within and between are equal.

• Simplifies to only internal links.

Aij = 1 Aij = 0

δij = 1 aij −bij

δij = 0 −aij bij

Link present or missing?

Link within orbetween community?

Reward (+) orPenalize (−)

Page 13: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

First principle approach

• Reward “good” links, penalize “bad” links.

• Assume contribution for links within and between are equal.

• Simplifies to only internal links.

Objective function

H = −∑ij

(aijAij−bij(1− Aij)) δij

Page 14: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Null-model

Objective function

H = −∑ij

(aijAij−bij(1− Aij)) δij

• Introduce weights aij = 1− γRBpij and bij = γRBpij .

• Null-model pij , constrained by∑

ij pij = 2m.

• Parameter γ known as resolution parameter:I Higher γ ⇒ smaller communities.I Lower γ ⇒ larger communities.

• Leads to HRB = −∑ij

(Aij − γpij) δij .

Page 15: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Configuration Null-model

Reichard-Bornholdt objective function

HRB = −∑ij

(Aij − γpij) δij

• Rewire links, but keep degrees unchanged

• Probability link between i and j is then pij =kikj2m .

i jki kj

γ = 1 leads to modularity Q =1

2m

∑ij

(Aij −

kikj2m

)δij ∼ −H.

Page 16: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Configuration Null-model

Reichard-Bornholdt objective function

HRB = −∑ij

(Aij − γpij) δij

• Rewire links, but keep degrees unchanged

• Probability link between i and j is then pij =kikj2m .

i jki kj

γ = 1 leads to modularity Q =1

2m

∑ij

(Aij −

kikj2m

)δij ∼ −H.

Page 17: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Basic properties

Modularity

Q =1

2m

∑ij

(Aij −

kikj2m

)δij

• Normalized: between −1 (bad partition) and 1 (good partition).

• Trivial partitions have modularity Q = 0.

• No community consists of a single node.

• Each community is internally connected.

• Modularity maximization is NP-complete.

Page 18: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Outline

1 Introduction

2 Modularity

3 Problem 1: Incomparability

4 Problem 2: Resolution-limit

5 Resolution-limit-free

Page 19: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Examples: grid

Modularity tends to 1.

Page 20: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Examples: tree

Modularity tends to 1.

Page 21: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Examples: random graph

Random partition has Q ∼ 0, but best partition has Q ∼ 〈√k〉

〈k〉 > 0.

Page 22: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Problem 1: Incomparability

Modularity can be high even“without communities”

Page 23: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Compare modularity scores

(Dis)agreement on whether smoking is cancerous.

Page 24: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Scientific specialization

Page 25: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Scientific specialization

Page 26: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Scientific specialization

Page 27: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Outline

1 Introduction

2 Modularity

3 Problem 1: Incomparability

4 Problem 2: Resolution-limit

5 Resolution-limit-free

Page 28: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Example: ring of cliques

Modularity might merge cliques

Page 29: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Example: ring of rings

Modularity might split rings

Page 30: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Example: Cliques of different size

No longer fix γ = 1: tune γ so that it finds “correct” partition.

No γ yields “correct” partition.

Page 31: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Example: broad community sizes

Competing constraints:

• Lower γ ⇒ merge smaller cliques.

• Higher γ ⇒ split large cliques.

No γ yields “correct” partition.

Page 32: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Example: broad community sizes

Competing constraints:

• Lower γ ⇒ merge smaller cliques.

• Higher γ ⇒ split large cliques.

No γ yields “correct” partition.

Page 33: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Example: broad community sizes

Competing constraints:

• Lower γ ⇒ merge smaller cliques.

• Higher γ ⇒ split large cliques.

No γ yields “correct” partition.

Page 34: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Problem 2: Resolution-limit

Size of communities mightdepend on size of network

Page 35: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Outline

1 Introduction

2 Modularity

3 Problem 1: Incomparability

4 Problem 2: Resolution-limit

5 Resolution-limit-free

Page 36: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Resolution limit revisited

Resolution-limit

Resolution-limit-free

• Problem is not merging per se.

• Rather, cliques separate in subgraph, but merge in large graph(or vice versa).

Page 37: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Resolution limit revisited

Resolution-limit

Resolution-limit-free

Definition (Resolution-limit-free)

Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.

Page 38: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Resolution-limit methods

General community detection

H = −∑ij

(aijAij−bij(1− Aij)) δij

Not resolution-limit-free

RB model Set aij = 1− γpij , bij = γRBpij .

Modularity Set pij =kikj2m and γ = 1

What weights aij and bij to choose so that model isresolution-limit-free?

Page 39: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Resolution-limit-free methods

General community detection

H = −∑ij

(aijAij−bij(1− Aij)) δij

Resolution-limit-free

RN model Set aij = 1, bij = γRN .

CPM Set aij = 1− γ and bij = γ. Leads to

H = −∑ij

(Aij − γ)δij .

Are there any other weights aij and bij?

Page 40: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Local weights

General community detection

H = −∑ij

(aijAij−bij(1− Aij)) δij

i

j

Definition (Local weights)

Weights local when they only depend on subgraph induced by iand j .

Page 41: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Local weights

General community detection

H = −∑ij

(aijAij−bij(1− Aij)) δij

i

j

Theorem (Local weights ⇒ resolution-limit-free)

Method is resolution-limit-free if weights are local.

Page 42: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Local weights

General community detection

H = −∑ij

(aijAij−bij(1− Aij)) δij

i

j

Are local weights necessary?

Page 43: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

“Almost” resolution-free ⇒ local weights

α1

α2α3

β1

β2

β3

β4

Not resolution free, whenever merged in large graph,but separate in subgraph, or the other way around. So

Hm < Hs and H′m > H′

s , or

Hm > Hs and H′m < H′

s

Page 44: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

“Almost” resolution-free ⇒ local weights

α1

α2α3

β1

β2

β3

β4

Working out, yields

α′3(β4(nc − 1) + 2β3) < α3(β′4(nc − 1) + 2β′3) or,

α′3(β4(nc − 1) + 2β3) > α3(β′4(nc − 1) + 2β′3).

Always satisfied for non-local weights, except when

(nc − 1) = 2β′3α3 − β3α

′3

β4α′3 − β′4α3

.

Page 45: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Louvain like algorithm

1

1

11

1

1

1

11

1

57 5 62

1 Init node sizes ni = 1

2 Loop over all nodes (randomly), calculate improvement

∆H(σi = c) = (ei↔c − 2γni∑j

njδ(σj , c)),

3 Create community graph, repeat procedure

Page 46: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Louvain like algorithm

1

1

11

1

1

1

11

1

57 5 62

1 Init node sizes ni = 1

2 Loop over all nodes (randomly), calculate improvement

∆H(σi = c) = (ei↔c − 2γni∑j

njδ(σj , c)),

3 Create community graph, repeat procedure

Page 47: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Louvain like algorithm

1

1

11

1

1

1

11

1

57 5 62

1 Init node sizes ni = 1

2 Loop over all nodes (randomly), calculate improvement

∆H(σi = c) = (ei↔c − 2γni∑j

njδ(σj , c)),

3 Create community graph, repeat procedure

Page 48: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Louvain like algorithm

1

1

11

1

1

1

11

1

57 5 62

1 Init node sizes ni = 1

2 Loop over all nodes (randomly), calculate improvement

∆H(σi = c) = (ei↔c − 2γni∑j

njδ(σj , c)),

3 Create community graph, repeat procedure

Page 49: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Louvain like algorithm

1

1

11

1

1

1

11

1

57 5 62

1 Init node sizes ni = 1

2 Loop over all nodes (randomly), calculate improvement

∆H(σi = c) = (ei↔c − 2γni∑j

njδ(σj , c)),

3 Create community graph, repeat procedure

Page 50: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Performance (directed networks)

µ0 0.2 0.4 0.6 0.8 1.0

NMI

0.25

0.5

0.75

1

CPMγ=γ∗

ERγ=p

RB ConfγRB=γ∗

RB

Mod.γRB=1

Inf.

n = 103

n = 104

Page 51: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Problems of CPM

No problems:

• Doesn’t merge cliques.

• Doesn’t split rings.

• Correctly detects cliques of different size.

Remaining problem:

• Communities of different density.

Page 52: Limits of community detection

Introduction Modularity Problem 1: Incomparability Problem 2: Resolution-limit Resolution-limit-free

Conclusions

• Modularity high even when no communities.

• Modularity merge/splits communities unexpectedly(resolution-limit).

• Methods using local weights are resolution-limit-free.

• Resolution-limit-free method performs superbly.

Open question:

• what when communities have different densities?

Thank you for your attention.

Questions?