document

88
Clustered graph, visualization and hierarchical visualization Nathalie Villa-Vialaneix http://www.nathalievilla.org [email protected] Séminaire LIPN, Université Paris 13 November, 15th, 2012 Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 1 / 35

Upload: tuxette

Post on 11-May-2015

89 views

Category:

Science


0 download

DESCRIPTION

November, 15th, 2012 Séminaire LIPN, Université Paris 13

TRANSCRIPT

Page 1: Document

Clustered graph, visualization and hierarchicalvisualization

Nathalie Villa-Vialaneixhttp://www.nathalievilla.org

[email protected]

Séminaire LIPN, Université Paris 13

November, 15th, 2012

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 1 / 35

Page 2: Document

An overview on graph visualization and clustering

Framework

A graph (network) G = (V ,E,W) with

• n vertices (nodes) V = {x1, . . . , xn};

• edges, E, weighted by Wij = Wji ≥ 0 (Wii = 0).

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 2 / 35

Page 3: Document

An overview on graph visualization and clustering

Network mining through visualizationA standard approach for network mining: using a force directedplacement algorithm (FDP) to display the graph; e.g.,[Fruchterman and Reingold, 1991]

• attractive forces: along the edges, analogous to springs;• repulsive forces : between all pairs of nodes, analogous to electric

forces.

The algorithm starts from an initial (random) position and iterates until thelayout is stabilized.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 3 / 35

Page 4: Document

An overview on graph visualization and clustering

Network mining through visualizationA standard approach for network mining: using a force directedplacement algorithm (FDP) to display the graph; e.g.,[Fruchterman and Reingold, 1991]

• attractive forces: along the edges, analogous to springs;• repulsive forces : between all pairs of nodes, analogous to electric

forces.

The algorithm starts from an initial (random) position and iterates until thelayout is stabilized.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 3 / 35

Page 5: Document

An overview on graph visualization and clustering

Network mining through visualizationA standard approach for network mining: using a force directedplacement algorithm (FDP) to display the graph; e.g.,[Fruchterman and Reingold, 1991]

• attractive forces: along the edges, analogous to springs;• repulsive forces : between all pairs of nodes, analogous to electric

forces.

The algorithm starts from an initial (random) position and iterates until thelayout is stabilized.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 3 / 35

Page 6: Document

An overview on graph visualization and clustering

Drawbacks of FDP algorithms

• slow (hard to use for very large graphs);

• are more oriented toward aesthetic than toward an interpretablelayout:• tendency: short edges with uniform lengths;• negative consequence: hubs are clustered in the center of the figure.

What the user usually prefers:

1 understanding the macroscopic structure of the graph, i.e., find out“communities” and their relations;

2 focus on details for clusters that seem to be of interest.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 4 / 35

Page 7: Document

An overview on graph visualization and clustering

Drawbacks of FDP algorithms

• slow (hard to use for very large graphs);• are more oriented toward aesthetic than toward an interpretable

layout:• tendency: short edges with uniform lengths;• negative consequence: hubs are clustered in the center of the figure.

What the user usually prefers:

1 understanding the macroscopic structure of the graph, i.e., find out“communities” and their relations;

2 focus on details for clusters that seem to be of interest.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 4 / 35

Page 8: Document

An overview on graph visualization and clustering

Drawbacks of FDP algorithms

• slow (hard to use for very large graphs);• are more oriented toward aesthetic than toward an interpretable

layout:• tendency: short edges with uniform lengths;• negative consequence: hubs are clustered in the center of the figure.

What the user usually prefers:

1 understanding the macroscopic structure of the graph, i.e., find out“communities” and their relations;

2 focus on details for clusters that seem to be of interest.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 4 / 35

Page 9: Document

An overview on graph visualization and clustering

Emphasizing “communities” in the layout1 global approach: displaying all vertices while modifying the forces in

such a way that the dense areas are emphasized: [Noack, 2007](LinLog algorithm)

2 clustering the vertices and then using a simplified representation ofthe graph

3 combined approach: hierarchical representations where finerdetails are provided to the user[Auber et al., 2003, Auber and Jourdan, 2005, Seifi et al., 2010]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 5 / 35

Page 10: Document

An overview on graph visualization and clustering

Emphasizing “communities” in the layout1 global approach: displaying all vertices while modifying the forces in

such a way that the dense areas are emphasized: [Noack, 2007](LinLog algorithm)

2 clustering the vertices and then using a simplified representation ofthe graph [Herman et al., 2000]• partition the nodes into clusters V1, . . . , VC ;• display the clustered graph: nodes V1, . . . , VC (surface proportional

to |Vj |) and edges width proportional to∑

xk∈Vi ,xk ′∈VjWij

Main issue: Modify FDP to allows us to display nodes with differentsizes.

3 combined approach: hierarchical representations where finerdetails are provided to the user[Auber et al., 2003, Auber and Jourdan, 2005, Seifi et al., 2010]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 5 / 35

Page 11: Document

An overview on graph visualization and clustering

Emphasizing “communities” in the layout1 global approach: displaying all vertices while modifying the forces in

such a way that the dense areas are emphasized: [Noack, 2007](LinLog algorithm)

2 clustering the vertices and then using a simplified representation ofthe graph [Herman et al., 2000]• partition the nodes into clusters V1, . . . , VC ;• display the clustered graph: nodes V1, . . . , VC (surface proportional

to |Vj |) and edges width proportional to∑

xk∈Vi ,xk ′∈VjWij

Main issue: Modify FDP to allows us to display nodes with differentsizes.

3 combined approach: hierarchical representations where finerdetails are provided to the user[Auber et al., 2003, Auber and Jourdan, 2005, Seifi et al., 2010]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 5 / 35

Page 12: Document

An overview on graph visualization and clustering

Emphasizing “communities” in the layout1 global approach: displaying all vertices while modifying the forces in

such a way that the dense areas are emphasized: [Noack, 2007](LinLog algorithm)

2 clustering the vertices and then using a simplified representation ofthe graphalternative approach: displaying while clustering as inSelf-Organizing Maps [Boulet et al., 2008],[Rossi and Villa-Vialaneix, 2010] and [Olteanu et al., 2013]

3 combined approach: hierarchical representations where finerdetails are provided to the user[Auber et al., 2003, Auber and Jourdan, 2005, Seifi et al., 2010]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 5 / 35

Page 13: Document

An overview on graph visualization and clustering

Emphasizing “communities” in the layout

1 global approach: displaying all vertices while modifying the forces insuch a way that the dense areas are emphasized: [Noack, 2007](LinLog algorithm)

2 clustering the vertices and then using a simplified representation ofthe graph

3 combined approach: hierarchical representations where finerdetails are provided to the user[Auber et al., 2003, Auber and Jourdan, 2005, Seifi et al., 2010]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 5 / 35

Page 14: Document

An overview on graph visualization and clustering

Outline of this talk

• self-organizing maps based on kernels and dissimilarities;• modularity based representations:

• combined with a map;• used hierarchically.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 6 / 35

Page 15: Document

Self-organizing maps approaches

Outline

1 Self-organizing maps approaches

2 Modularity based approaches

3 Soft modularityHierarchical clustering and visualization

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 7 / 35

Page 16: Document

Self-organizing maps approaches

Basic ideas about SOM

Project the graph on a squared grid (each square of the grid is a cluster)

Project the graph on a squared grid (each square of the grid is a cluster)such that:• the nodes in a same cluster are highly connected• the nodes in two close clusters are also (less) connected• the nodes in two distant clusters are (almost) not connected

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 8 / 35

Page 17: Document

Self-organizing maps approaches

Basic ideas about SOM

Project the graph on a squared grid (each square of the grid is a cluster)such that:• the nodes in a same cluster are highly connected• the nodes in two close clusters are also (less) connected• the nodes in two distant clusters are (almost) not connected

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 8 / 35

Page 18: Document

Self-organizing maps approaches

Basics on Self-Organizing Maps (for multidimensionaldata)• the map is made of neurons (visually symbolized by, e.g.,

rectangles), 1...M, with which prototypes pi are associated (aprototype is a “representer” of the neuron in the original dataset);

• the map is equipped with a neighborhood relationship, i.e., a“distance” (actually a dissimilarity) between neurons, D;

• goal: find the best mapping f(xi) ∈ {1, . . . ,M} of the data xi in thedifferent neurons by minimizing the energy

E =n∑

i=1

M∑j=1

h(D(f(xi), j))‖xi − pi‖2.

i.e., each data is assigned to a neuron so that:• the neuron’s prototype is “close” to the data;• the neighboring prototypes are also “close” to the data;• distant prototypes are “distant” of the data.

(topology preservation)

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 9 / 35

Page 19: Document

Self-organizing maps approaches

Basics on Self-Organizing Maps (for multidimensionaldata)• the map is made of neurons (visually symbolized by, e.g.,

rectangles), 1...M, with which prototypes pi are associated (aprototype is a “representer” of the neuron in the original dataset);

• the map is equipped with a neighborhood relationship, i.e., a“distance” (actually a dissimilarity) between neurons, D;

• goal: find the best mapping f(xi) ∈ {1, . . . ,M} of the data xi in thedifferent neurons by minimizing the energy

E =n∑

i=1

M∑j=1

h(D(f(xi), j))‖xi − pi‖2.

i.e., each data is assigned to a neuron so that:• the neuron’s prototype is “close” to the data;• the neighboring prototypes are also “close” to the data;• distant prototypes are “distant” of the data.

(topology preservation)

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 9 / 35

Page 20: Document

Self-organizing maps approaches

Basics on Self-Organizing Maps (for multidimensionaldata)• the map is made of neurons (visually symbolized by, e.g.,

rectangles), 1...M, with which prototypes pi are associated (aprototype is a “representer” of the neuron in the original dataset);

• the map is equipped with a neighborhood relationship, i.e., a“distance” (actually a dissimilarity) between neurons, D;

• goal: find the best mapping f(xi) ∈ {1, . . . ,M} of the data xi in thedifferent neurons by minimizing the energy

E =n∑

i=1

M∑j=1

h(D(f(xi), j))‖xi − pi‖2.

i.e., each data is assigned to a neuron so that:• the neuron’s prototype is “close” to the data;• the neighboring prototypes are also “close” to the data;• distant prototypes are “distant” of the data.

(topology preservation)Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 9 / 35

Page 21: Document

Self-organizing maps approaches

SOM, dissimilarity SOM and kernel SOM (batch)

Original SOM algorithm (batch): x1, . . . , xn ∈ Rd

1: Initialization: randomly set p01 ,...,p0

M in Rd

2: for l = 1→ L do3: for all i = 1→ n do Assignment4: f l(xi)← arg minj=1,...,M ‖xi − p l−1

j ‖Rd

5: end for6: for all j = 1→ M do Representation7: p l

j ← arg minp∈Rd∑n

i=1 h l(D(f l(xi), j))‖xi − p‖2Rd

8: end for9: end for

Problems with graphs: xi are nodes so 1/ how to define the prototypes?and 2/ which distance to use between nodes?

[Villa and Rossi, 2007, Boulet et al., 2008]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 10 / 35

Page 22: Document

Self-organizing maps approaches

SOM, dissimilarity SOM and kernel SOM (batch)

Dissimilarity SOM (batch): xi ∈ G defined by a dissimilarity relation:δ(xi , xj)

1: Initialization: randomly set p01 ,...,p0

M in (xi)i

2: for l = 1→ L do3: for all i = 1→ n do Assignment4: f l(xi)← arg minj=1,...,M δ(xi , p l−1

j )5: end for6: for all j = 1→ M do Representation7: p l

j ← arg minp∈(xi)i

∑ni=1 h l(D(f l(xi), j))δ(xi , p)

8: end for9: end for

[Kohohen and Somervuo, 1998, Kohonen and Somervuo, 2002]

[Villa and Rossi, 2007, Boulet et al., 2008]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 10 / 35

Page 23: Document

Self-organizing maps approaches

SOM, dissimilarity SOM and kernel SOM (batch)

Dissimilarity SOM (batch): xi ∈ G defined by a dissimilarity relation:δ(xi , xj)

1: Initialization: randomly set p0j ← γ0

ji xi (symbolic)2: for l = 1→ L do3: for all i = 1→ n do Assignment4: f l(xi) ← arg minj=1,...,M δ2(xi , p l−1

j ) =(∆γl−1

j

)i− 1

2 (γl−1j )T ∆γl−1

jwhere ∆ = (δ(xk , xk ′))k ,k ′

5: end for6: for all j = 1→ M do Representation7: γl

j ← arg minγ∈Rn∑n

i=1 h l(D(f l(xi), j))δ2(xi ,

∑nk=1 γk xk

)8: end for9: end for

[Rossi et al., 2007]

[Villa and Rossi, 2007, Boulet et al., 2008]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 10 / 35

Page 24: Document

Self-organizing maps approaches

SOM, dissimilarity SOM and kernel SOM (batch)

Kernel SOM (batch): xi ∈ G defined by a kernel relation: K(xi , xj)⇒∃ φ : G → (H , 〈., .〉H):K(x, x′) = 〈φ(x), φ(x′)〉H

1: Initialization: randomly set p0j ←

∑ni=1 γ

0jiφ(xi)

2: for l = 1→ L do3: for all i = 1→ n do Assignment4: f l(xi)← arg minj=1,...,M ‖φ(xi) − p l−1

j ‖H where ‖φ(xi) − p l−1j ‖H =∑n

k=1 γl−1jk γl−1

jk ′ K(xk , xk ′) − 2∑n

k=1 γl−1jk K(xi , xk )

5: end for6: for all j = 1→ M do Representation7: γl

jk ← arg minγ∈Rn∑n

i=1 h l(D(f l(xi), j))‖φ(xi) −∑n

k=1 γkφ(xk )‖2H

8: end for9: end for

[Villa and Rossi, 2007, Boulet et al., 2008]

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 10 / 35

Page 25: Document

Self-organizing maps approaches

Dissimilarity SOM (stochastic)

(Online relational SOM) [Olteanu et al., 2013]

1: Initialization: randomly set γ0ji in R

2: for l = 1→ L do3: Randomly chose an input xi

4: Assignment f t (xi)← arg minj=1,...,M

(γl−1

j ∆)i− 1

2γl−1j ∆(γl−1

j )T

5: for all j = 1→ M do Update of the prototypes6: γl

j ← γl−1j + αlh l(D(f l(xi), j))

(1i − γ

l−1j

)where 1i is a vector with a

single non null coefficient at the ith position, equal to one

7: end for8: end for

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 11 / 35

Page 26: Document

Self-organizing maps approaches

Which dissimilarities/kernels for graphs?

Laplacian [Kondor and Lafferty, 2002]

For a graph with vertices V = {x1, . . . , xn} and weights (wi,j)i,j=1,...,n

(positive, symmetric), the Laplacian is: L = (Li,j)i,j=1,...,n where

Li,j =

{−wi,j if i , jdi =

∑j,i wi,j if i = j

;

1 Diffusion matrix [Kondor and Lafferty, 2002]: for β > 0,Kβ = e−βL =

∑+∞k=1

(−βL)k

k ! heat kernel (or diffusion kernel);

2 Generalized inverse of the Laplacian [Fouss et al., 2007] :K = L+;

3 Dissimilarity: length of the shortest path between two nodes.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 12 / 35

Page 27: Document

Self-organizing maps approaches

Which dissimilarities/kernels for graphs?

Laplacian [Kondor and Lafferty, 2002]

For a graph with vertices V = {x1, . . . , xn} and weights (wi,j)i,j=1,...,n

(positive, symmetric), the Laplacian is: L = (Li,j)i,j=1,...,n where

Li,j =

{−wi,j if i , jdi =

∑j,i wi,j if i = j

;

1 Diffusion matrix [Kondor and Lafferty, 2002]: for β > 0,Kβ = e−βL =

∑+∞k=1

(−βL)k

k ! heat kernel (or diffusion kernel);

2 Generalized inverse of the Laplacian [Fouss et al., 2007] :K = L+;

3 Dissimilarity: length of the shortest path between two nodes.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 12 / 35

Page 28: Document

Self-organizing maps approaches

Which dissimilarities/kernels for graphs?

Laplacian [Kondor and Lafferty, 2002]

For a graph with vertices V = {x1, . . . , xn} and weights (wi,j)i,j=1,...,n

(positive, symmetric), the Laplacian is: L = (Li,j)i,j=1,...,n where

Li,j =

{−wi,j if i , jdi =

∑j,i wi,j if i = j

;

1 Diffusion matrix [Kondor and Lafferty, 2002]: for β > 0,Kβ = e−βL =

∑+∞k=1

(−βL)k

k ! heat kernel (or diffusion kernel);

2 Generalized inverse of the Laplacian [Fouss et al., 2007] :K = L+;

3 Dissimilarity: length of the shortest path between two nodes.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 12 / 35

Page 29: Document

Self-organizing maps approaches

Which dissimilarities/kernels for graphs?

Laplacian [Kondor and Lafferty, 2002]

For a graph with vertices V = {x1, . . . , xn} and weights (wi,j)i,j=1,...,n

(positive, symmetric), the Laplacian is: L = (Li,j)i,j=1,...,n where

Li,j =

{−wi,j if i , jdi =

∑j,i wi,j if i = j

;

1 Diffusion matrix [Kondor and Lafferty, 2002]: for β > 0,Kβ = e−βL =

∑+∞k=1

(−βL)k

k ! heat kernel (or diffusion kernel);

2 Generalized inverse of the Laplacian [Fouss et al., 2007] :K = L+;

3 Dissimilarity: length of the shortest path between two nodes.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 12 / 35

Page 30: Document

Self-organizing maps approaches

A first example: a medieval social networkExample from [Boulet et al., 2008], http://graphcomp.univ-tlse2.fr/ In the “Archive départementalesdu Lot” (Cahors, France), big corpus of 5000 transactions (mostly landcharters)

• coming from 4 “seigneuries” (about 25 little villages) in South West ofFrance;

• being established between 1240 and 1520 (just before and after thehundred years’ war).

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 13 / 35

Page 31: Document

Self-organizing maps approaches

Simplification of this network by kernel SOM

nodes: individuals (' 600)named in the transactions, re-stricted to transactions estab-lished before the HYW; edges:the fact that two individuals arenamed in a common transac-tion or have a common lord

Kernel SOM with heat kernel

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 14 / 35

Page 32: Document

Self-organizing maps approaches

Simplification of this network by kernel SOM

Kernel SOM with heat kernel

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 14 / 35

Page 33: Document

Self-organizing maps approaches

A brief comparison with spectral clustering

Number of clusters: 35 50Maximum size of the clusters: 255 268

Modularity: 0.597 0.420

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 15 / 35

Page 34: Document

Self-organizing maps approaches

A brief comparison with spectral clustering

Number of clusters: 35 29Maximum size of the clusters: 255 325

Modularity: 0.597 0.433

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 15 / 35

Page 35: Document

Self-organizing maps approaches

Online relational SOM (faster)Description:• nodes: 105 American political books;• edges weighted by the number of co-purchasing of the two books on

the internet (Amazon.com).

FDP representation

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 16 / 35

Page 36: Document

Self-organizing maps approaches

Online relational SOM (faster)Description:

• nodes: 105 American political books;

• edges weighted by the number of co-purchasing of the two books onthe internet (Amazon.com).

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 16 / 35

Page 37: Document

Modularity based approaches

Outline

1 Self-organizing maps approaches

2 Modularity based approaches

3 Soft modularityHierarchical clustering and visualization

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 17 / 35

Page 38: Document

Modularity based approaches

Modularity [Newman and Girvan, 2004]

Popular quality measure for graph clustering: a partition of the verticesin C clusters, (Ck )k=1,...,C has modularity:

Q(C) =1

2m

C∑k=1

∑i,j∈Ck

(Wij − Pij)

where Pij are weights corresponding to a “null model” where the weightsonly depend on the nodes properties and not on the cluster they belong to.

More precisely,

Pij =didj

2m

with di = 12∑

j,i Wij is the degree of a vertex xi .A “good” clustering should maximize Q.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 18 / 35

Page 39: Document

Modularity based approaches

Modularity [Newman and Girvan, 2004]

Popular quality measure for graph clustering: a partition of the verticesin C clusters, (Ck )k=1,...,C has modularity:

Q(C) =1

2m

C∑k=1

∑i,j∈Ck

(Wij − Pij)

where Pij are weights corresponding to a “null model” where the weightsonly depend on the nodes properties and not on the cluster they belong to.More precisely,

Pij =didj

2m

with di = 12∑

j,i Wij is the degree of a vertex xi .

A “good” clustering should maximize Q.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 18 / 35

Page 40: Document

Modularity based approaches

Modularity [Newman and Girvan, 2004]

Popular quality measure for graph clustering: a partition of the verticesin C clusters, (Ck )k=1,...,C has modularity:

Q(C) =1

2m

C∑k=1

∑i,j∈Ck

(Wij − Pij)

where Pij are weights corresponding to a “null model” where the weightsonly depend on the nodes properties and not on the cluster they belong to.More precisely,

Pij =didj

2m

with di = 12∑

j,i Wij is the degree of a vertex xi .A “good” clustering should maximize Q.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 18 / 35

Page 41: Document

Modularity based approaches

Interpretation

• Q increases when (xi , xj) are in a same cluster and have trueweight Wij greater than the ones expected in the null model, Pij

• Q increases when (xi , xj) are in a two different clusters and havetrue weight Wij smaller than the ones expected in the null model, Pij

becauseQ(C) +

12m

∑k,k ′

∑i∈Ck , j∈Ck ′

(Wij − Pij) = 0.

• Contrary to the minimization of the number of edges betweenclusters, modularity can help to separate nodes with high degreesinto different clusters more easily

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 19 / 35

Page 42: Document

Modularity based approaches

Interpretation

• Q increases when (xi , xj) are in a same cluster and have trueweight Wij greater than the ones expected in the null model, Pij

• Q increases when (xi , xj) are in a two different clusters and havetrue weight Wij smaller than the ones expected in the null model, Pij

becauseQ(C) +

12m

∑k,k ′

∑i∈Ck , j∈Ck ′

(Wij − Pij) = 0.

• Contrary to the minimization of the number of edges betweenclusters, modularity can help to separate nodes with high degreesinto different clusters more easily

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 19 / 35

Page 43: Document

Modularity based approaches

Drawing optimized clustering

Combine:

• high modularity to ensure high intra clusters density and lowexternal connectivity

• little edge crossing

by:• Classic solution: relying on graph drawing algorithm after maximization

of the modularity

• Extend the modularity to a criterium adapted to a prior structure (like agrid)

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 20 / 35

Page 44: Document

Modularity based approaches

Drawing optimized clustering

Combine:

• high modularity to ensure high intra clusters density and lowexternal connectivity

• little edge crossing by:• Classic solution: relying on graph drawing algorithm after maximization

of the modularity

• Extend the modularity to a criterium adapted to a prior structure (like agrid)

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 20 / 35

Page 45: Document

Modularity based approaches

Drawing optimized clustering

Combine:

• high modularity to ensure high intra clusters density and lowexternal connectivity

• little edge crossing by:• Classic solution: relying on graph drawing algorithm after maximization

of the modularity• Extend the modularity to a criterium adapted to a prior structure (like a

grid)

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 20 / 35

Page 46: Document

Soft modularity

Outline

1 Self-organizing maps approaches

2 Modularity based approaches

3 Soft modularityHierarchical clustering and visualization

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 21 / 35

Page 47: Document

Soft modularity

Self Organizing Map principle

For data in Rd , SOM minimizes (over the clustering and the prototypes(pk ))

M∑j=1

n∑i=1

Sf(xi),j‖xi − pj‖2Rd

where Skl encodes the prior structure: close to 1 for close clusters andclose to 0 for distant clusters

This corresponds to a soft membership: xi belongs to Cj withmembership Sf(xi),j .

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 22 / 35

Page 48: Document

Soft modularity

Self Organizing Map principle

For data in Rd , SOM minimizes (over the clustering and the prototypes(pk ))

M∑j=1

n∑i=1

Sf(xi),j‖xi − pj‖2Rd

where Skl encodes the prior structure: close to 1 for close clusters andclose to 0 for distant clustersThis corresponds to a soft membership: xi belongs to Cj withmembership Sf(xi),j .

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 22 / 35

Page 49: Document

Soft modularity

Organized modularity [Rossi and Villa-Vialaneix, 2010]

Same idea: encode a prior structure via a matrix S.Maximize:

SQ =1

2m

∑i,j

Sf(i)f(j)(Wij − Pij)

Hence:

• if a pair of vertices (xi , xj) is such that Wij > Pij , SQ increases with thecloseness of f(xi) and f(xj) in the prior structure

• if a pair of vertices (xi , xj) is such that Wij < Pij , SQ increases if f(xi)and f(xj) are distant in the prior structure

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 23 / 35

Page 50: Document

Soft modularity

Organized modularity [Rossi and Villa-Vialaneix, 2010]

Same idea: encode a prior structure via a matrix S.Maximize:

SQ =1

2m

∑i,j

Sf(i)f(j)(Wij − Pij)

Hence:

• if a pair of vertices (xi , xj) is such that Wij > Pij , SQ increases with thecloseness of f(xi) and f(xj) in the prior structure

• if a pair of vertices (xi , xj) is such that Wij < Pij , SQ increases if f(xi)and f(xj) are distant in the prior structure

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 23 / 35

Page 51: Document

Soft modularity

Optimization

The clustering is represented by a n × C assignment matrix M withMik = δf(i)=k . The goal is then to maximize

SQ = F(M) =1

2m

∑i,j

∑k ,l

Mik SklMlj(Wij − Pij)

Combinatorial problem is NP-complet⇒ use of deterministic algorithm

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 24 / 35

Page 52: Document

Soft modularity

Optimization

The clustering is represented by a n × C assignment matrix M withMik = δf(i)=k . The goal is then to maximize

SQ = F(M) =1

2m

∑i,j

∑k ,l

Mik SklMlj(Wij − Pij)

Combinatorial problem is NP-complet⇒ use of deterministic algorithm

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 24 / 35

Page 53: Document

Soft modularity

Comparison on the co-appearance network from “LesMisérables”Co-appearance network from “Les Misérables” [Knuth, 1993]

●●

Myriel

Napoleon

MlleBaptistine

MmeMagloire

CountessDeLo

Geborand

Champtercier

Cravatte

Count

OldMan

Labarre

Valjean

Marguerite

MmeDeR

Isabeau

Gervais

Tholomyes

Listolier

FameuilBlacheville

Favourite

DahliaZephine

Fantine

MmeThenardierThenardier

Cosette

Javert

Fauchelevent

Bamatabois

Perpetue

Simplice

Scaufflaire

Woman1Judge

Champmathieu

BrevetChenildieuCochepaille

PontmercyBoulatruelle

Eponine

Anzelma

Woman2

MotherInnocent

Gribier

Jondrette

MmeBurgonGavroche

Gillenormand

Magnon

MlleGillenormand

MmePontmercy

MlleVaubois

LtGillenormand

Marius

BaronessT

Mabeuf

EnjolrasCombeferre

Prouvaire

Feuilly

CourfeyracBahorelBossuetJoly

Grantaire

MotherPlutarch

GueulemerBabetClaquesous

Montparnasse

Toussaint

Child1Child2

Brujon

MmeHucheloup

77 nodesdensity = 8.7%transitivity = 49.9 %

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 25 / 35

Page 54: Document

Soft modularity

Methodology

Comparison of:• Kernel SOM with various kernels: heat kernel, generalized inverse of

the Laplacian, modularity kernel (i.e., the positive part of W − P whichmimics the optimization of the modularity) and spectral SOM (basedon the first M eigenvectors of the Laplacian)

• SQ optimization

Parameters varied:

• size of the prior grid or number of clusters

• for organized clusterings, type of neighborhood on the grid

• for SOM, random or PCA initialization and kernel parameter for theheat kernel

Selection of the solutions: Pareto points according to modularity andnumber of edge crossing

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 26 / 35

Page 55: Document

Soft modularity

Methodology

Comparison of:• Kernel SOM with various kernels: heat kernel, generalized inverse of

the Laplacian, modularity kernel (i.e., the positive part of W − P whichmimics the optimization of the modularity) and spectral SOM (basedon the first M eigenvectors of the Laplacian)

• SQ optimization

Parameters varied:

• size of the prior grid or number of clusters

• for organized clusterings, type of neighborhood on the grid

• for SOM, random or PCA initialization and kernel parameter for theheat kernel

Selection of the solutions: Pareto points according to modularity andnumber of edge crossing

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 26 / 35

Page 56: Document

Soft modularity

Methodology

Comparison of:• Kernel SOM with various kernels: heat kernel, generalized inverse of

the Laplacian, modularity kernel (i.e., the positive part of W − P whichmimics the optimization of the modularity) and spectral SOM (basedon the first M eigenvectors of the Laplacian)

• SQ optimization

Parameters varied:

• size of the prior grid or number of clusters

• for organized clusterings, type of neighborhood on the grid

• for SOM, random or PCA initialization and kernel parameter for theheat kernel

Selection of the solutions: Pareto points according to modularity andnumber of edge crossing

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 26 / 35

Page 57: Document

Soft modularity

A brief comment on the kernel SOM solutions

●● ●● ●●● ●● ●● ● ●● ●● ● ●●

●●●●●●

0.0 0.1 0.2 0.3 0.4 0.5 0.6

010

020

030

040

050

0

Generalized inverse

Modularity

Cro

ssin

g ed

ges

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

● ●●

●●●●●

●●●●●●

Size 3Size 4Size 5Pareto point

●●● ●●●●● ●●●●● ●●●●● ●●●●● ● ●●●●●

●●●●●

●●●●●

●●

●●●

●●

●●

● ●●●● ●●● ●● ●

●●●● ● ●●●

● ● ●● ●

●●●● ●● ●●● ●● ●● ●●● ●●● ●● ●● ● ●● ●

0.0 0.1 0.2 0.3 0.4 0.5 0.6

010

020

030

040

050

0

Heat kernel

Modularity

Cro

ssin

g ed

ges

●●●

●●

●●●

●●

●●

● ●●

● ●

● ●●

●●●

●●●● ●

●●

●● ●

●●

●●

●●●

● ●●

●●●●●

●●

●●●

●●●

●●●

●● ●●●●

●●●

●● ●● ●●● ●●●

●●●●●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●●●

Size 3Size 4Size 5Pareto point

●●●

●● ●●●●

●●● ●● ● ●● ●● ● ●●●

● ●

0.0 0.1 0.2 0.3 0.4 0.5 0.6

010

020

030

040

050

0

Modularity kernel

Modularity

Cro

ssin

g ed

ges

●● ●●

●●

● ●

●●

● ●●

●● ●

●●

● ●

●●●

●●

●●

Size 3Size 4Size 5

●●●●● ●●●

●●●

● ●●●●

●● ●●●

● ●●●

0.0 0.1 0.2 0.3 0.4 0.5 0.6

010

020

030

040

050

0

Spectral SOM

Modularity

Cro

ssin

g ed

ges

●●

●●

●●

●●

● ●

●●

● ●

Size 3Size 4Size 5

Spectral SOM and Modularity kernel obtain poor resultsGraph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 27 / 35

Page 58: Document

Soft modularity

Analysis of the Pareto points for “Les Misérables”Method Number Modularity Nb of pairs

of clusters of cut edges

Organized mod. 42 (7) 0.5638 1Organized mod. 52 (7) 0.5652 3

32 (6) 0.5472 0

kernel SOM (HK) 52 (22) 0.5327 2732 (8) 0.5276 2

●●

●●

●●●

●●

●●

●●

●● ●●

●●

● ●

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 28 / 35

Page 59: Document

Soft modularity

Analysis of the Pareto points for “Les Misérables”Method Number Modularity Nb of pairs

of clusters of cut edges

Organized mod. 42 (7) 0.5638 1Organized mod. 52 (7) 0.5652 3

32 (6) 0.5472 0

kernel SOM (HK) 52 (22) 0.5327 2732 (8) 0.5276 2

●●

●●

●●●

●●

●●

●●

●● ●●

●●

● ●

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 28 / 35

Page 60: Document

Soft modularity Hierarchical clustering and visualization

Outline

1 Self-organizing maps approaches

2 Modularity based approaches

3 Soft modularityHierarchical clustering and visualization

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 29 / 35

Page 61: Document

Soft modularity Hierarchical clustering and visualization

Global overview

[Rossi and Villa-Vialaneix, 2011] 2 combined steps:

• Find out a clustering hierarchy (by repeating modularity optimizationin clusters) + test of the significativity of the partition at each step;

• Display the different levels of the hierarchy by a modified forcedirected algorithm.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 30 / 35

Page 62: Document

Soft modularity Hierarchical clustering and visualization

A hierarchy of clusteringAim: Limiting the resolution default of modularity (see [Fortunato, 2010].How to do so? Iterate the modularity optimization in each cluster. Themodularity is optimized by a greedy algorithm with multi-levels refinementsimilar to that of [Noack and Rotta, 2009].

Step 1

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 31 / 35

Page 63: Document

Soft modularity Hierarchical clustering and visualization

A hierarchy of clusteringAim: Limiting the resolution default of modularity (see [Fortunato, 2010].How to do so? Iterate the modularity optimization in each cluster. Themodularity is optimized by a greedy algorithm with multi-levels refinementsimilar to that of [Noack and Rotta, 2009].

Step 2

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 31 / 35

Page 64: Document

Soft modularity Hierarchical clustering and visualization

A hierarchy of clusteringAim: Limiting the resolution default of modularity (see [Fortunato, 2010].How to do so? Iterate the modularity optimization in each cluster. Themodularity is optimized by a greedy algorithm with multi-levels refinementsimilar to that of [Noack and Rotta, 2009].

Step 3

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 31 / 35

Page 65: Document

Soft modularity Hierarchical clustering and visualization

Stopping criterionA clustering algorithm always provides a solution, relevant or not!

Significativity of a node clustering:1 Generate random graphs with the same degree distribution than the

original graph;2 Optimize the modularity on these random graphs;3 Find out the p-value of the observed modularity compared to the

empirical distribution on random graphs;4 If the new clustering is not found to be significant, the algorithm is

stopped.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 32 / 35

Page 66: Document

Soft modularity Hierarchical clustering and visualization

Stopping criterionA clustering algorithm always provides a solution, relevant or not!Significativity of a node clustering:

1 Generate random graphs with the same degree distribution than theoriginal graph;Simulation process: MCMC algorithm of [Roberts Jr., 2000] whichpermutes edges on random couples of pairs of connected nodes;

2 Optimize the modularity on these random graphs;3 Find out the p-value of the observed modularity compared to the

empirical distribution on random graphs;4 If the new clustering is not found to be significant, the algorithm is

stopped.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 32 / 35

Page 67: Document

Soft modularity Hierarchical clustering and visualization

Stopping criterionA clustering algorithm always provides a solution, relevant or not!Significativity of a node clustering:

1 Generate random graphs with the same degree distribution than theoriginal graph;Simulation process: MCMC algorithm of [Roberts Jr., 2000] whichpermutes edges on random couples of pairs of connected nodes;

After Q |E | permutations, the obtained graph is random for the uniformdistribution on the set of graphs with the same degre distribution.

2 Optimize the modularity on these random graphs;3 Find out the p-value of the observed modularity compared to the

empirical distribution on random graphs;4 If the new clustering is not found to be significant, the algorithm is

stopped.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 32 / 35

Page 68: Document

Soft modularity Hierarchical clustering and visualization

Stopping criterionA clustering algorithm always provides a solution, relevant or not!Significativity of a node clustering:

1 Generate random graphs with the same degree distribution than theoriginal graph;

2 Optimize the modularity on these random graphs;3 Find out the p-value of the observed modularity compared to the

empirical distribution on random graphs;

4 If the new clustering is not found to be significant, the algorithm isstopped.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 32 / 35

Page 69: Document

Soft modularity Hierarchical clustering and visualization

Stopping criterionA clustering algorithm always provides a solution, relevant or not!Significativity of a node clustering:

1 Generate random graphs with the same degree distribution than theoriginal graph;

2 Optimize the modularity on these random graphs;3 Find out the p-value of the observed modularity compared to the

empirical distribution on random graphs;4 If the new clustering is not found to be significant, the algorithm is

stopped.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 32 / 35

Page 70: Document

Soft modularity Hierarchical clustering and visualization

Display a clustering hierarchy

Basics• start from the first step of the clustering;

• expand the clusters by order of minimal decrease in modularity.

Issues

1 taking into account the size of the clusters: [Tunkelang, 1999] for amodification of the FDR algorithm to nodes that have different sizes;

2 estimating in advance the place needed to represent a cluster when itis expanded.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 33 / 35

Page 71: Document

Soft modularity Hierarchical clustering and visualization

Display a clustering hierarchy

Basics• start from the first step of the clustering;

• expand the clusters by order of minimal decrease in modularity.

Issues

1 taking into account the size of the clusters: [Tunkelang, 1999] for amodification of the FDR algorithm to nodes that have different sizes;

2 estimating in advance the place needed to represent a cluster when itis expanded.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 33 / 35

Page 72: Document

Soft modularity Hierarchical clustering and visualization

Display a clustering hierarchy

Basics• start from the first step of the clustering;

• expand the clusters by order of minimal decrease in modularity.

Issues

1 taking into account the size of the clusters: [Tunkelang, 1999] for amodification of the FDR algorithm to nodes that have different sizes;

2 estimating in advance the place needed to represent a cluster when itis expanded.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 33 / 35

Page 73: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

From the same corpus of medieval documents:

• nodes: transactions and active individuals. 3 918 individuals and6 455 transactions (total: 10 373 sommets);

• edges model the active involvement of an individual in a transaction.

Modularity optimization: 48 clusters having from 10 to 740 nodes.Hierarchy : 4 levels, 89 classes on the latest levels.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 74: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

From the same corpus of medieval documents:

• nodes: transactions and active individuals. 3 918 individuals and6 455 transactions (total: 10 373 sommets);

• edges model the active involvement of an individual in a transaction.

Modularity optimization: 48 clusters having from 10 to 740 nodes.

Hierarchy : 4 levels, 89 classes on the latest levels.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 75: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

From the same corpus of medieval documents:

• nodes: transactions and active individuals. 3 918 individuals and6 455 transactions (total: 10 373 sommets);

• edges model the active involvement of an individual in a transaction.

Modularity optimization: 48 clusters having from 10 to 740 nodes.Hierarchy : 4 levels, 89 classes on the latest levels.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 76: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 77: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 78: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 79: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 80: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 81: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 82: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 83: Document

Soft modularity Hierarchical clustering and visualization

Another medieval network...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 34 / 35

Page 84: Document

Soft modularity Hierarchical clustering and visualization

Conclusion

Mining a graph from a clustering• clustering can be used to provide a simplified representation of the

network and to help the user understand its macroscopic structure;

• optimizing the modularity seems to provide better results thanapproaches based on the Laplacian (it helps for separating hubs andthus results in more balanced clusters);

• approaches presented here are almost fully automated: solutions totune the parameters are provided in the corresponding articles.

Perspectives: improve the representation of the hierarchy, incorporateadditional information on nodes and edges in the clustering...

Merci pour votre attention...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 35 / 35

Page 85: Document

Soft modularity Hierarchical clustering and visualization

Conclusion

Mining a graph from a clustering• clustering can be used to provide a simplified representation of the

network and to help the user understand its macroscopic structure;

• optimizing the modularity seems to provide better results thanapproaches based on the Laplacian (it helps for separating hubs andthus results in more balanced clusters);

• approaches presented here are almost fully automated: solutions totune the parameters are provided in the corresponding articles.

Perspectives: improve the representation of the hierarchy, incorporateadditional information on nodes and edges in the clustering...

Merci pour votre attention...

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 35 / 35

Page 86: Document

Soft modularity Hierarchical clustering and visualization

ReferencesAuber, D., Chiricota, Y., Jourdan, F., and Melançon, G. (2003).Multiscale visualization of small world networks.In INFOVIS’03.

Auber, D. and Jourdan, F. (2005).Interactive refinement of multi-scale network clusterings.In International Conference on Information Visualisation, pages 703–709, Los Alamitos, CA, USA. IEEE Computer Society.

Boulet, R., Jouve, B., Rossi, F., and Villa, N. (2008).Batch kernel SOM and related laplacian methods for social network analysis.Neurocomputing, 71(7-9):1257–1273.

Fortunato, S. (2010).Community detection in graphs.Physics Reports, 486:75–174.

Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007).Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation.IEEE Trans Knowl Data En, 19(3):355–369.

Fruchterman, T. and Reingold, B. (1991).Graph drawing by force-directed placement.Software Pract Exper, 21:1129–1164.

Herman, I., Melançon, G., and Scott Marshall, M. (2000).Graph visualization and navigation in information visualisation.6(1):24–43.

Knuth, D. (1993).The Stanford GraphBase: A Platform for Combinatorial Computing.Addison-Wesley, Reading, MA.

Kohohen, T. and Somervuo, P. (1998).

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 35 / 35

Page 87: Document

Soft modularity Hierarchical clustering and visualization

Self-Organizing maps of symbol strings.Neurocomputing, 21:19–30.

Kohonen, T. and Somervuo, P. (2002).How to make large self-organizing maps for nonvectorial data.Neural Networks, 15(8):945–952.

Kondor, R. and Lafferty, J. (2002).Diffusion kernels on graphs and other discrete structures.In Proceedings of the 19th International Conference on Machine Learning, pages 315–322.

Newman, M. and Girvan, M. (2004).Finding and evaluating community structure in networks.Phys Rev E, 69:026113.

Noack, A. (2007).Energy models for graph clustering.J Graph Algorithms Appl, 11(2):453–480.

Noack, A. and Rotta, R. (2009).Multi-level algorithms for modularity clustering.In SEA ’09: Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257–268, Berlin, Heidelberg.Springer-Verlag.

Olteanu, M., Villa-Vialaneix, N., and Cottrell, M. (2013).On-line relational som for dissimilarity data.In Estevez, P., Principe, J., Zegers, P., and Barreto, G., editors, Advances in Self-Organizing Maps (Proceedings of WSOM2012), volume 198 of AISC, pages 13–22, Springer Verlag, Berlin, Heidelberg.To appear.

Roberts Jr., J. M. (2000).Simple methods for simulating sociomatrices with given marginal totals.Social Networks, 22(3):273 – 283.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 35 / 35

Page 88: Document

Soft modularity Hierarchical clustering and visualization

Rossi, F., Hasenfuss, A., and Hammer, B. (2007).Accelerating relational clustering algorithms with sparse prototype representation.In 6th International Workshop on Self-Organizing Maps (WSOM), Bielefield, Germany. Neuroinformatics Group, BielefieldUniversity.

Rossi, F. and Villa-Vialaneix, N. (2010).Optimizing an organized modularity measure for topographic graph clustering: a deterministic annealing approach.Neurocomputing, 73(7-9):1142–1163.

Rossi, F. and Villa-Vialaneix, N. (2011).Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets.Journal de la Société Française de Statistique, 152(3):34–65.

Seifi, M., Guillaume, J., Latapy, M., and Le Grand, B. (2010).Visualisation interactive multi-échelle des grands graphes : application à un réseau de blogs.In Atelier EGC 2010, Visualisation et Extraction de Connaissances, Hammamet, Tunisie.

Tunkelang, D. (1999).A Numerical Optimization Approach to General Graph Drawing.PhD thesis, School of Computer Science, Carnegie Mellon University.CMU-CS-98-189.

Villa, N. and Rossi, F. (2007).A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph.In 6th International Workshop on Self-Organizing Maps (WSOM), Bielefield, Germany. Neuroinformatics Group, BielefieldUniversity.

Graph mining (Séminaire LIPN) Nathalie Villa-Vialaneix Paris, 11/15 2012 35 / 35