communities, spectral clustering, and random walksbindel/present/2011-09-scan.pdf · indicators...
TRANSCRIPT
![Page 1: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/1.jpg)
Communities, Spectral Clustering, andRandom Walks
David Bindel
Department of Computer ScienceCornell University
26 Sep 2011
![Page 2: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/2.jpg)
1
2
3
4
5
6 7
8
9
10
11
12
13
14
15
16
17
1819
20
21
22
23
24
25
26
27
28
29
30
![Page 3: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/3.jpg)
Basic setting
Informal approach:Community = unusually tightly connected nodes in a network
Formal version:Given graph G = (V ,E), seek subgraph G′ = (V ′,E ′):
1. Based on optimality properties (cut size, modularity, etc)2. Based on dynamics on G (random walks and variants)
Two approaches unified by linear algebra!
(For today, all graphs are undirected, most are unweighted.)
![Page 4: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/4.jpg)
Unusually tightly connected?
What constitutes “unusual” connectivity of a subgraph?I High internal connectivity?I Low external connectivity?
![Page 5: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/5.jpg)
Basic notation
The adjacency matrix A ∈ {0,1}n×n for G is
Aij =
{1, (i , j) ∈ E0, otherwise
Also define
e = vector of n onesd = Ae = degree vectorD = diag(d)
L = D − A = graph Laplacian
![Page 6: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/6.jpg)
Measuring subgraphs of G
Indicate V ′ ⊆ V by s ∈ {0,1}n. Can write many properties ofthe induced subgraph via quadratic forms:
sT As = |E ′| = number of (directed) edges in subgraph
sT Ds = number of (directed) edges incident on subgraph
sT Ls = sT (D − A)s = edges between V ′ and V ′
Example: e indicates all of V , and
m = eT Ae = eT De = number of (directed) edges
0 = eT Le = edges between V and ∅
![Page 7: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/7.jpg)
Configuration model
Configuration model for a random graph G = (V , E):I The degree vector d is specified.I P{(i , j) ∈ E} = didj/m.
Self-loops are allowed.
Expected adjacency matrix, degree vector and matrix, andLaplacian are
A =ddT
md = Ae = d
D = D
L = D − A
![Page 8: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/8.jpg)
Modularity
Define B = A− A = L− L. Then
sT Bs = unexpected extra edges in subgraph= unexpected lack of cut edges
If s1, . . . , sc indicate a partition into c sets,
Q :=12
c∑j=1
sTj Bsj = modularity of partition
![Page 9: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/9.jpg)
Bisection by optimization
Idea: Find s ∈ {0,1}n such that eT s = n/2 andI sT Ls is minimized (min cut) orI sT Bs is maximized (max modularity)
Equivalently: Find s ∈ {−1/2,1/2}n such that eT s = 0I sT Ls = sT Ls is minimized orI sT Bs = sT Bs is maximized
Oops — NP hard!
![Page 10: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/10.jpg)
Spectral bisection
Relaxation makes the problem easier:Hard: minimize sT Ls s.t. eT s = 0, s ∈ {−1/2,1/2}n.Easy: minimize vT Lv s.t. eT v = 0, v ∈ Rn, ‖v‖2 = n/4.
Now v is an eigenvector for second smallest eigenvalue of L.Use sign pattern of v to partition =⇒ spectral bisection.Heuristic works well in practice (often with some refinement).
Same idea works for modularity.
![Page 11: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/11.jpg)
Rayleigh quotients
Given matrices (K ,M), the generalized Rayleigh quotient is
ρK ,M(x) =xT KxxT Mx
.
Can represent interesting subgraph properties:
ρA,I(s) = mean internal degree in subgraph
ρL,I(s) = edges cut between V ′ and V ′
ρA,D(s) = fraction of incident edges internal to V ′
ρL,D(s) = fraction of incident edges cut
ρB,I(s) = mean “surprising” internal degree in subgraphρB,D(s) = mean fraction of internal degree that is surprisingρB,L(s) = fraction of edge cuts that are surprising
![Page 12: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/12.jpg)
Rayleigh quotients and eigenvalues
Suppose M is positive definite. Basic connection:
ρK ,M stationary at x ⇐⇒ Kx = ρK ,M(x)Mx
Stationary points are (generalized) eigenvalues. Reasonable tocompute (even though the optimization is nonconvex!).
![Page 13: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/13.jpg)
Limits of Rayleigh quotients
But small variations kill us:
maxx 6=0
xT Ax‖x‖22
= λmax(A), but
maxx 6=0
xT Ax‖x‖21
= 1− ω−1
where ω is the max clique size (Motzkin-Strauss).
![Page 14: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/14.jpg)
Rayleigh quotients and eigenproblems
For M pos. def., have the generalized eigendecomposition
W T MW = I and W T KW = Λ = diag(λ1, . . . , λn).
For any x , the gen. RQ is a weighted average of eigenvalues
ρK ,M(x) =n∑
j=1
λjz2j ,
where z = W−1x/‖W−1x‖2. Therefore1. λmax = maxx 6=0 ρK ,M(x)
2. If ρK ,M(s) is near λmax, most weight is on largeeigenvalues. So s nearly lies in the invariant subspaceassociated with the large eigenvalues.
So look at invariant subspaces for extreme eigenvalues.
![Page 15: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/15.jpg)
Another reason to look at subspaces
Spectrum of a Gn,p graph:I One large eigenvalue ≈ npI Other eigs between ≈ ±
√np(1− p)/4
I Adjacency matrix = peeT + “noise”
Composite model: A ≈ S diag(β)ST , S ∈ {0,1}n×c
I Motivation: possibly-overlapping random graphsI Columns of S are one basis for range spaceI Want to go from some general basis back to S
![Page 16: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/16.jpg)
Indicators from subspaces, take 1
U spans a small subspace (e.g. an invariant subspace)1. If span(u1, . . . ,uc) ≈ span(s1, . . . , sc) where {sj} indicate a
partition, rows of U in the same partition are identical.Idea: Treat rows of U are latent coordinates. Cluster.
2. Suppose we have some indicator s ≈ Uy . Then row U(j , :)I forms an acute angle with y when sj = 1I is almost normal to y when sj = 0.
Clustering? What if sets overlap?
![Page 17: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/17.jpg)
Indicators from subspaces, take 2
Suppose s ≈ Uy for some y , si = 1. Want to find s.Try optimization (a linear program):
minimize ‖s‖1 (proxy for sparsity of s)s.t. s = Uy (s in the right space)
si ≥ 1 (“seed” constraint)s ≥ 0 (componentwise nonnegativity)
Recovers smallest set containing node i ifI U = SY−1 exactly.I Each set contains at least one element only in that set.
(Frequently works if there is not “too much” overlap.)
What about noise? Generally need a thresholding strategy.
![Page 18: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/18.jpg)
Indicators from subspaces, take 3
Alternate optimization (box-constrained quadratic program):
minimize 12 sT Ps + τ‖s‖1
s.t. si ≥ 1s ≥ 0
Recover LP with P = I −UUT and τ → 0 (assuming UT U = I).I Can let P be more general semidefinite matrix (e.g. P = L)I Size of τ controls sparsity (can automate choice)
![Page 19: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/19.jpg)
Summary so far
Two pieces to spectral community detection:I Pull out an invariant subspaceI Mine the subspace for community structure
![Page 20: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/20.jpg)
The random walker
![Page 21: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/21.jpg)
The random walker, take 1
Lazy random walk on a graph:
pk+1 = T pk =12
(I + T )pk → p∞ = d/m
where T = AD−1 is a transition matrix (column stochastic).
Idea: extract community structure from random walk dynamicsI Start at a node i and take a few stepsI Rapidly explore the local community (only one?)I Probability “leaks” into adjoining communities (slowly?)
![Page 22: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/22.jpg)
The random walker, take 2
If a random walk starts at known i , goes k steps:
pk (j) = P{end at j | start at i} = eTj T kei
If end at known j , uniform prior on starting point, then
qk (i) = P{start at i | end at j} =1
Zj,keT
j T kei
Idea: extract structure from how fast we forget starting pointsDay 1: David came up with a funny joke!Day 2: There’s a joke going around the CS department.Day 3: I read this bad joke while browsing the web...
![Page 23: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/23.jpg)
Simon-Ando theory
Markov chain with loosely-coupled subchains. Dynamics are:I Rapid local mixing: after a few steps
pk ≈c∑
j=1
αj,kp(j)∞
where p(j)∞ is a local equilibrium for the j th subchain
I Slow equilibration: αj,k → αj,∞.
Alternately, rapid local mixing looks like:
φk ≈c∑
j=1
γj,ksj
where sj is an indicator for nodes in one subchain.
![Page 24: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/24.jpg)
Simon-Ando theory
In chemistry:transient dynamics = transitions among metastable states.
In network analysis:transient dynamics = transitions among communities?
But what if mixing happens so fast we miss the transient?
![Page 25: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/25.jpg)
Random walks and spectrum
Write
T =12
(I + T )
T = AD−1 = D1/2AD−1/2
A = D−1/2AD−1/2
Have an eigendecomposition A = QΛQT . Then
pk = (D1/2Q)(I + Λ)k (QT D−1/2)p0
φk = (D−1/2Q)(I + Λ)k (QT D1/2)φ0/Zj,k
![Page 26: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/26.jpg)
Spectral picture of Simon-Ando
φk = (D−1/2Q)(I + Λ)k (QT D1/2)φ0/Zj,k
=1
Zj,k
n∑j=1
(1 + λj)kcj(D−1/2φj)
≈ 1Zj,k
c∑j=1
(1 + λj)kcj(D−1/2qj)
I Gap in the spectrum between λc and λc+1
I After a few steps, (1 + λc+1)k negligible, but not (1 + λc)k .So qk lies approximately in span of D−1/2q1, . . . ,D−1/2qc .
I Treat as a perturbation of decoupled case where subchainindicator vectors are eigenvectors for unit eigenvalues.D−1/2qj ≈ linear combo of indicators, j = 1, . . . , c.
![Page 27: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/27.jpg)
Summary so far
Two pieces to spectral community detection:I Pull out an invariant subspaceI Mine the subspace for community structure
Motivation: optimization or random walk dynamics.
But...I What about when n and c are both large?I What if there is no clear spectral gap?
Would like an alternative to invariant subspaces!
![Page 28: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/28.jpg)
Eigenvectors to Ritz vectors
Eigenvectors are stationary points of Rayleigh quotients.Find stationary points in a subspace =⇒ Ritz vectors.
Usual approach to large-scale eigenproblems:1. Generate a basis for a Krylov subspace
Kk (A, x0) = span{x0,Ax0,A2x0, . . . ,Ak−1x0}
2. Ritz values rapidly approximate extreme eigenvalues3. Ritz vectors approximate extreme eigenvectors
Idea: Instead of searching invariant subspace, search in aspace spanned by a few scaled Ritz vectors. Pulls outdynamics of short random walks (vs long).
![Page 29: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/29.jpg)
Current favorite method
1. Pick “seed” nodes j1, j2, . . .2. Take short random walks (length k ) from each seed3. Extract a few Ritz vectors (fewer than k ) from
span{φ0, φ1, . . . , φk−1}.4. Use quadratic programming to find approximate indicators
in subspace space spanned by all Ritz vectors.5. Possibly add more seeds and return to step 1.6. Threshold to get initial indicator approximation.7. Greedily optimize angle between indicator and space.
![Page 30: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/30.jpg)
Wang test graph
1
2
3
4
5
6 7
8
9
10
11
12
13
14
15
16
17
1819
20
21
22
23
24
25
26
27
28
29
30
![Page 31: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/31.jpg)
Spectrum for Wang test graph
5 10 15 20 25 30−0.5
0
0.5
1
Index
Eig
enva
lue
![Page 32: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/32.jpg)
Zachary Karate graph
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
![Page 33: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/33.jpg)
Spectrum for Karate
5 10 15 20 25 30
−0.5
0
0.5
1
Index
Eig
enva
lue
![Page 34: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/34.jpg)
Football graph
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
3031
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
5960
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
![Page 35: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/35.jpg)
Spectrum for Football
20 40 60 80 100
0
0.5
1
Index
Eig
enva
lue
![Page 36: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/36.jpg)
Dolphin graph
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
![Page 37: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/37.jpg)
Spectrum for Dolphin
10 20 30 40 50 60
−0.5
0
0.5
1
Index
Eig
enva
lue
![Page 38: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/38.jpg)
Non-overlapping synthetic benchmark (µ = 0.5)
0 100 200 300 400 500 600 700 800 900 1000
0
100
200
300
400
500
600
700
800
900
1000
nz = 15746
![Page 39: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/39.jpg)
Spectrum for synthetic benchmark
10 20 30 40 50 60 70 80 90 100
0.4
0.6
0.8
1
Index
Eig
enva
lue
![Page 40: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/40.jpg)
Score vector
100 200 300 400 500 600 700 800 900 1,0000
0.2
0.4
0.6
0.8
1
Index
Sco
re
Score vector for the two-node seed of 492 and 513 in the firstLFR benchmark graph. Ten steps, three Ritz vectors.
![Page 41: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/41.jpg)
Non-overlapping synthetic benchmark (µ = 0.6)
0 100 200 300 400 500 600 700 800 900 1000
0
100
200
300
400
500
600
700
800
900
1000
nz = 15316
![Page 42: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/42.jpg)
Spectrum for synthetic benchmark
10 20 30 40 50 60 70 80 90 100
0.4
0.6
0.8
1
Index
Eig
enva
lue
![Page 43: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/43.jpg)
Score vector
100 200 300 400 500 600 700 800 900 1,0000
0.5
1
Index
Sco
re
Score vector for the two-node seed of 492 and 513 in the firstLFR benchmark graph. Ten steps, three Ritz vectors.
![Page 44: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/44.jpg)
Overlapping synthetic benchmark (µ = 0.3)
I 1000 nodesI 47 communitiesI 500 nodes belong to two communities
![Page 45: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/45.jpg)
Spectrum for synthetic benchmark
10 20 30 40 50 60 70 80 90 100
0.4
0.6
0.8
1
Index
Eig
enva
lue
![Page 46: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/46.jpg)
Score vector
100 200 300 400 500 600 700 800 900 1,0000
0.5
1
Index
Sco
re
Score vector for the two-node seed of 521 and 892.The desired indicator is in red.
![Page 47: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/47.jpg)
Score vector
100 200 300 400 500 600 700 800 900 1,0000
0.5
1
1.5
Index
Sco
re
Score vector for the two-node seed of 521 and 892 +twelve reseeds. The desired indicator is in red.
![Page 48: Communities, Spectral Clustering, and Random Walksbindel/present/2011-09-scan.pdf · Indicators from subspaces, take 2 Suppose s ˇUy for some y, si = 1. Want to find s. Try optimization](https://reader033.vdocument.in/reader033/viewer/2022052723/5f0e710d7e708231d43f436c/html5/thumbnails/48.jpg)
Conclusions
Classic spectral methods use eigenvectors to find communities,but:
I We don’t need to stop at partitioning!I Overlap is okayI Key is how we mine the subspace
I We don’t need to stop at eigenvectors!I Can also use Ritz vectorsI Computation is cheap: short random walks