![Page 1: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/1.jpg)
1
Complex Networks Analysis:Clustering Methods
Spring 2013
ISI ETH Zurich
Nikolai Nefedov
![Page 2: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/2.jpg)
2
OutlinePurpose to give an overview of modern graph-clustering methods and their
applications for analysis of complex dynamic networks.
Planned topics
•short introduction to complex networks•discrete vector calculus, graph Laplacian, graph spectral analysis•methods of community detection based on modularity maximization• random walk on graphs, Laplacian dynamics, stability of community detection•multi-layer graphs: clustering and regularization• topology detection via system dynamics•dynamic network analysis and missing links prediction•applications for real-world datasets
(multi-dimensional time series and network analysis)
![Page 3: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/3.jpg)
3
Complex vs Complicated
Complex systems (no unique definition):• a (large) number of interacting elements• stochastic interactions• no centralized authority, self-organized
• Emerging properties system behavior arises from interaction structure: detailed understanding of elements in isolation is not enough
• even if elements follow simple rules (chaotic behavior) • evolving structures, system adaptation • hierarchies, heavy-tails,...
Complex Systems => Statistical physics • large scale regularities• microscopic origins of marcoscopic behavior• multiple (hierarchical) scales
Complex Systems
![Page 4: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/4.jpg)
4
Complex Systems => Complex Networks
Stat. Physics approach• a fixed level of abstraction • vertices => interacting elements• edges => interactions
• (statistical) analysis of network structure • dynamical processes taking place on a network • dynamics of a network Graph theory approach (mostly static graphs)• simple graphs => cuts, structure, factorization, spanning trees, ...• multigraphs => multiple edges and self-loops• hypergraphs => hyper-edge as a set of vertices• multi-layer graphs => a set of graphs on the same vertices => tensors• multiplexing graphs
Complex Systems
![Page 5: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/5.jpg)
5
Graph Theory
Origin: Leonhard Euler (1736)
L. Euler, Solutio problematis ad geometriam situs pertinentis, Comment. Academiae Sci. J. Petropolitanae 8, 128-140 (1736)(Euler theorem: when we can draw a graph with a single line)
Königsberg
![Page 6: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/6.jpg)
6
![Page 7: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/7.jpg)
7
Stat. Physics approach
• network analysis • statistical analysis (random networks, small-world, scale-free networks) • network structure analysis • clustering • network partition • classification (taxonomy => hierarchical classification) • clustering => unsupervised classification (problem dependent) relates data to knowledge (basic human activity) • dynamical processes taking place on a network • random walk, opinion (voting) dynamics, synchronization game-strategies... • convergence, stability... • distributed computations/control • dynamics of a network • evolving networks • interplay between network topology and dynamics on a network • adaptive /learning networks
Complex Networks
![Page 8: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/8.jpg)
8
OutlinePurpose to give an overview of modern graph-clustering methods and their
applications for analysis of complex dynamic networks.
Planned topics
•short introduction to complex networks•discrete vector calculus, graph Laplacian, graph spectral analysis•methods of community detection based on modularity maximization• random walk on graphs, Laplacian dynamics, stability of community detection•multi-layer graphs: clustering and regularization• topology detection via system dynamics•dynamic network analysis and missing links prediction•applications for real-world datasets
(multi-dimensional time series and network analysis)
![Page 9: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/9.jpg)
9
OutlinePurpose to give an overview of modern graph-clustering methods and their
applications for analysis of complex dynamic networks.
Planned topics
•short introduction to complex networks• complex networks, definitions, basics
•Graph partition• min-cut, normalized-cut, min-ratio-cut
•Brief overview of vector calculus: • differential operators (gradient, divergence, Laplace operator)
•Graph Laplacian as a discrete version of Laplace-Beltrami operator•Spectral analysis based on graph Laplacian•Limits of spectral analysis
![Page 10: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/10.jpg)
10
Basics: Network StructureNetwork or graph G = (V,E) => set of vertices joined by edges,
V = {vi } set of vertices i =1,…, N,
E = {e (i, j ) } set of links/edges => (ordered) pair elements from V ,
max | E | = N (N – 1) /2 ;
vi is a neighbor of v
j if there is e ( i, j ) in E
number of neighbors k of a vertex vi is called its degree
in directed networks: in- and out- degrees k in, k
out
edge density of the graph:
ρ = 1 => fully connected, ρ << 1 => sparse graph
Cycle/loop = closed path (distinct vertices/edges)
Graph types: regular, tree, forest …
Bipartite network: 2 types of nodes, links only between nodes of different types.
ρ=∣E∣/ N N−1 /2
![Page 11: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/11.jpg)
11
Basics: Network Structure
Shortest path between i and j => a path with min number of edgesDistance d(i,j) => measure associated with the shortest path between i and j
Average shortest distance
Diameter of the graph
Connected graph: there is a path between any pair of nodesMin connected graph => no loops => tree, | E | = N - 1 edgesForest => collection of trees
Fully connected (complete) graph: d (i,j) = 1 for all i,j | E | = N(N – 1) /2
Adjacency matrix A (i,j) = 1 if e {i,j } in E, 0 otherwise
Clique: a fully connected subgraphk-clique: clique with k vertices
Motifs: subgraphs which often occur in a network (wrt to a null model)
⟨ l ⟩=∑ 2d i,j / N N−1
d = max d i,j
![Page 12: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/12.jpg)
12
Basics: Network StructureCentrality measures:
node degree = number of neighbors
Closeness centrality: measures how far (on the average) a vertex is from all other vertices
Betweenness centrality = number of shortest paths going through vertex/edge, measures the amount of flow through a vertex/edge,computationally demanding.
dc i =1/Σ j≠i d i,j
b i =∑l,m
d i l,m /d l,m d(l,m) shortest paths between l and m;
di(l,m) shortest paths going through node i
Clustering coefficient of a node
C i =1
k ik i−1 ∑j≠k
N
eij e jk eki
Average clustering coefficient of a graph C G =∑ C i /N = triangles
connected triples
C i =2 E i
k ik i−1
![Page 13: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/13.jpg)
13
Network: Statistical characterizationDegree distribution p(k) => probability that a randomly chosen vertex has degree k
P(k|k’): => cond. prob. that a vertex of degree k is connected to a vertex of degree k’
Average degree <k > = 2 |E| /N
Sparse graphs: <k> << N
Average degree of nearest neighbors of node i :
Average degree fluctuations: <k2>
Clustering spectrum (of vertices which have the same degree)
Topological heterogeneity: homogeneous networks: light tails heterogeneous networks: skewed, heavy tails
![Page 14: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/14.jpg)
14
Stochastic NetworksStochastic network -> not s single graph, but a statistical ensemble
Erdős–Rényi (random) networks: G (N,p)
- connect N vertices randomly, each pair is connected with probability p
- ensemble of possible realizations: network properties => averages over the ensemble
- average number of edges
- average degree
⟨E⟩ = pN N−1/2
Clustering coefficient
E-R networks CER = p =⟨k ⟩N
practically there is no clustering large random networks are tree-like networks
⟨k ⟩ = 2⟨E⟩ /N = p N−1 ≈ pN
C G = triangles
connected triples
![Page 15: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/15.jpg)
Erdős–Rényi Networks
Example N = 3, p = 1/3
![Page 16: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/16.jpg)
16
Erdős–Rényi Networks
pi k = C N−1k pk 1−p N−1−k
Pk =∑i= 1
N
pi k /N
= pN
k
k!exp−pN Pk =
⟨k ⟩k!
k
e−⟨k ⟩
average degree: ⟨k ⟩ = 2⟨E⟩ /N=p N−1 ≈ pN
=> Poisson distribution
For E-R networks
Degree distribution for the whole network
Probability that vertex i has a degree k• connected to k vertices, • not connected to the other N – k – 1
pi k
N ∞ s . t . ⟨k ⟩ = const
![Page 17: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/17.jpg)
Erdős–Rényi Networks
: many small subgraphs
⟨k ⟩=1
⟨k ⟩1
: phase transition (percolation)
: giant component + small subgraphs⟨k ⟩ >>1
Connected component sizes
17
⟨k ⟩
small subgraphs
giant component
N ∞ s . t . ⟨k ⟩ = const
relative giant component size
mean component size
![Page 18: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/18.jpg)
18
• Degree distribution: Poisson (degrees of all nodes close to average)
• No correlations, all edges exist independently of each other
• Path lengths grow logarithmically with system size, <l> ~ ln (N)
• Connectivity depends on average degree <k>
small <k> => several disjoint components,
high <k> => giant connected component
there is a percolation transition phase
(from a fragmented to a connected)
• Very “homogeneous” networks
Erdős–Rényi Networks
![Page 19: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/19.jpg)
19
Real-World Networks
Shortest path Clustering
Random networks Short Low
Real networks Short High
Regular-topology networks * Long High *
* [Watts & Strogatz 1998]
![Page 20: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/20.jpg)
20
Random vs Real-World Networks
Heavy tail distributions (often power law in log axes)
Degree distributions
Pk =⟨k ⟩k!
k
e−⟨k ⟩
Poison distribution
[Barabási & Albert, 1999]
![Page 21: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/21.jpg)
21
Network Models: Small-World D.J. Watts and S. Strogatz,”Collective dynamics of 'small-world' networks", Nature 393, 440–442, 1998
WS model:• Take a regular clustered network• Rewire the endpoint of each link to a random node with probability p
• SWN => a simple model for interpolating between regular and random networks
• Randomness controlled by a singletuning parameter
• N >> k >> ln(N) >> 1
Degree distribution
clustering coefficient
WS model, k>2 <= independent of system size
[Barrat & Weight, 2000]
![Page 22: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/22.jpg)
22
Path Length
Clustering
Network Models: Small-World Networks
“Small-World Network” short paths, high clustering
random network
regular network
N = 1000 k = 10
average over 20 realizationsat each p
[Watts & Strogatz]
![Page 23: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/23.jpg)
23
Network Models: Small-World Networks
Dynamics of sync, virus spreading :small number of shortcuts greatly speeds up the process: 3% shortcuts => 50% epidemic
Network structure strongly affects processes taking place on networks
Density of shortcuts
Epidemic sizeEpidemics: number of infected
[Watts & Strogatz]
![Page 24: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/24.jpg)
24
Network Models: Scale-Free Networks
A.-L. Barabási & R. Albert, Emergence of Scaling in Random Networks, Science 286, 509 (1999)
logarithmic axes
Power-Law Distribution
Degree distributions
![Page 25: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/25.jpg)
25
kc= cut-off due to finite-sizediverging degree fluctuations for
Fluctuations
Level of heterogeneity:
Power-law tails,
Power Law Distributions
F k F αk =α D F k Scale invariance:
P k = Ak−γ P αk = A αk −γ= α−γ P k Power-law:
γ< 3
k>kmin P k =
γ−1
k min γ−1
k−γ
⟨kn ⟩=∫k min
∞
kn P k dk
for γ n 1
⟨kn⟩ = k min
n γ−1γ−1−n
1<γ< 2⇒⟨k ⟩∞
∃ only ⟨k ⌊ γ−1 ⌋⟩ ∞
2<γ<3⇒⟨k 2⟩∞
<=> shift on log scale
for most of real world networks 2 <γ<3
![Page 26: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/26.jpg)
26
Power Law Distributions
power-law Pk =⟨k ⟩k exp −⟨k ⟩ /k!
logarithmic axes
Networks with Power Law Distributions => Scale-Free Networks
no characteristic scale (node degree) in the distribution
![Page 27: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/27.jpg)
27
Barabási-Albert Model Scale-Free Networks
Pk =2m2/k 3
B-A model of network growth• based on the principle of preferential attachment - “the rich get richer”
• results in networks with a power-law degree distribution (average degree <k> = 2m )
Where networks come from? Networks are not static => growth networks
π i=k i
∑ k i
1. Take a small seed network, e.g. a few connected nodes2. Let a new node of degree m enter the network3. Connect the new node to existing nodes such that
the probability of connecting to node i of degree ki is
Average shortest path lengths Clustering coefficient:
π i
Degree distribution
Pk =2m2
k3
![Page 28: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/28.jpg)
28
Random p = 0.02 Small world p = 0.1 Scale free <k> = 2
Network Models
![Page 29: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/29.jpg)
29
Network Models: Summary
Erdös-Renyi model
• short path lengths• Poisson distribution (no hubs)• no clustering
Barabási-Albert scale-free model
• short path lengths• power-law distribution for degrees• robustness• no clustering (may be fixed)
Real-world networks
• short path lengths• high clustering• broad degree distributions, often power laws
Watts-StrogatzSmall World model
• short path lengths• high clustering (N independent) • almost constant degrees
![Page 30: Complex Networks Analysis: Clustering Methodspeople.ee.ethz.ch/~nnefedov/CNA_lecture_01.pdf · 7 Stat. Physics approach • network analysis • statistical analysis (random networks,](https://reader031.vdocument.in/reader031/viewer/2022030511/5abbc81e7f8b9a24028d0360/html5/thumbnails/30.jpg)
30
Similarity Graphs Graphs embedded in space Euclidean distance (L2 norm) Manhattan distance (L1 norm) Cosine similarity
Graphs built from data: Data points from Euclidean space, sampling of some underlying distribution,... Connectivity parameter: k (KNN), ε - neighborhood graph, ... Similarity measure => fully connected (weighted ) matrix
Graphs not embedded in space
Neighborhood measures - structural equivalence: share the same neighbors => Jaccard coe cientffi - regular equivalence: if neighbors of a node are similar Pearson correlation coe cientffi Path dependent measures Measures based on random walk: - commute-time: average number of steps for a random to hit a target and return - escape probability: probability to hit a target before coming back