numerical methods for data science: spectral network...

Numerical Methods for Data Science:Spectral Network Analysis, Part I

David Bindel21 June 2019

Department of Computer ScienceCornell University

1

Lecture plan

Three threads from “lay of the land” to current research:

• Monday: Latent Factor Models• Wednesday: Scalable Kernel Methods• Friday: Spectral Network Analysis

• 1:30-2:30: Network spectra, optimization, and dynamics• 3:00-4:00: Network densities of states

Slides posted on web page (linked from my Cornell page).

2

http://www.cs.cornell.edu/~bindel/

Basics of Networks

Networks and Graphs

A graph (network) consists of

• Node (or vertex) set V• Edges E ⊂ V × V

• Undirected if (u, v) ∈ E =⇒ (v,u) ∈ E

• Optional edge weights E 7→ R

Can also add node weights or edge/node attributes.

3

Example Networks: Classic CS

Often small and/or highly structured:

• Finite state automata• Search trees and DAGs• Graphical models (correlated random variables)

Mostly not the topic for today.

4

Example Networks: Physical

Connected to physical objects in 2D/3D:

• Rivers, roads, transportation• Circuits and other electrical networks• Pipe flow networks• Computer networks

5

Example Networks: Citations

http://www.vosviewer.com/

Often directed, some very high-degree nodes, “small world”:

• Web pages, citation networks• Purchase networks

6

http://www.vosviewer.com/

Other Networks

Lots of others as well!

• Friendship networks• Interaction networks (phone calls, etc)• Food webs• Protein interactions• …

7

The Big Questions: Connectedness

How “well-connected” is the network?

8

The Big Questions: Geometric Embedding

Is there an underlying geometry to the network?

9

The Big Questions: Centrality and Ranking

Who are important players? 10

The Big Questions: Clustering and Communities

1

2

3

4

5

6 7

8

9

10

11

12

13

14

15

16

17

1819

20

21

22

23

24

25

26

27

28

29

30

What are the natural clusters or communities?11

The Big Questions

One might ask many more questions:

• Graph alignment: Can we map between similar structures?• Link prediction: Can we extrapolate the pattern?• Cascade analysis: How does information spread?• …

Common approach: map to a linear algebra problem!

12

From Networks to Matrices

1 23

45

67

11 1

1 11 1 1

1 11 1

1 1

Adjacency A; unweighted is

auv =

1, (u, v) ∈ E

0, otherwise

Degree du =∑

v auv is total adjacent edges (edge weight).Distinguish in/out in directed case.

13


1 23

45

67

1 −11 −1

1 −11 −1

1 −11 −1

−1 1

Differencing matrix (unweighted case)

gew =

1, e = (w, v)−1, e = (u,w)0, otherwise

14


1 23

45

67

1 −1−1 2 −1

−1 2 −1−1 3 −1 −1

−1 2 −1−1 2 −1

−1 −1 2

Laplacian L = GTG = D− A; unweighted is

luv =

degree du, u = v−1, (u, v) ∈ E

0, otherwise

15

A Bestiary of Matrices

• Adjacency matrix: A• Laplacian matrix: L = D− A• Unsigned Laplacian: L = D+ A• Random walk matrix: P = AD−1 (or D−1A)• Normalized adjacency: A = D−1/2AD−1/2

• Normalized Laplacian: L = I− A = D−1/2LD−1/2

• Modularity matrix: B = A− ddT2n

• Motif adjacency: W = A2 ⊙ A

16

Spectral Network Analysis

2 4 60

2

4

Three stories of why eigenstuff matters:

• Dynamics and diffusion• Measure and counting• Kernels and geometry

17

Story 1: Dynamics and Diffusion

The Random Walker

Graph with adjacency A (aij denotes edge j to i),

pij = aij/dj = probability of step j→ i

Let wj(t) denote probability a walker is at node j at time t; then

w(t+ 1) = Pw(t) = Ptw(0)

Equations for a discrete time Markov chain ≡ power iteration

18

The Random Walker

Suppose P diagonalizable and the walk is ergodic. Then

P = VΛV−1, Pt = VΛtV−1

with λ1 = 1 and |λj| < 1 for j = 1.

Let w∞ denote the stationary distribution; then

∥Pt − (w∞)e/nT∥ ≤ C|λ2|t

where e is the vector of all ones. Rate of convergencedetermined by second-largest eigenvalue modulus |λ2|.

19

Random Walker to Random Surfer

PageRank: Random walk + “teleport” with probability α

w(t+ 1) = (1− α)Pw(t) + αwref

Get a nice spectral gap (1− α), fast convergence to stationarity.20

Convergence Fast and Slow

2 4 6 8−0.5

0

0.5

1

• Eigenvalue at 1: stationary state• Size of |λ2| determines convergence• What do the other eigenvalues mean?• Interpreting two eigenvalues near 1 in dumbbell:

• Distribution rapidly approaches 2D subspace initially• Then slower relaxation to equilibrium

• Corresponds to metastable states (c.f. Simon-Ando theory)21

Dynamics in a Block Model

0 20 40 60 80 100 120 140 160 180

0

20

40

60

80

100

120

140

160

180

nz = 3996

≈

Composite model: A ≈ S diag(β)ST, S ∈ {0, 1}n×c

• Motivation: possibly-overlapping random graphs• Columns of S are one basis for range space• Want to go from some general basis back to S

22

Spectrum for a Block Model Sample

20 40 60 80 100 120 140 160 180

0

10

20

Index

Eigenvalue

23

Dominant Vectors

20 40 60 80 100 120 140 160 180

0

1

2

24

Same Space, Different Basis

20 40 60 80 100 120 140 160 180

0

0.2

0.4

0.6

0.8

1

25

And Beyond!

Many variations:

• Path counting and power of A• Continuous time walks and exp(−tL)• Hub/authority importance, HITS, and SVD power iteration• …

All involve linear time-invariant systems.

26

Story 2: Measure and Counting

Measurement by Quadratic Forms

Indicate V′ ⊆ V by s ∈ {0, 1}n. Measure subgraph:

sTAs = |E′| = internal edgessTDs = edges incident on subgraphsTLs = edges between V′ and V′

sTBs = “surprising” internal edges

Modularity matrix is B = A− ddT2n

27

Graph Bisection

Idea: Find s ∈ {0, 1}n such that eTs = n/2 to

• minimize sTLs (min cut)• maximize sTBs (max modularity)

Equivalently: Find s ∈ {±1}n such that eTs = 0 to

• minimize sTLs = sTLs or• maximize sTBs = sTBs

Oops — NP hard!

28

Relax!

2 4 6 8 10 12 14 16 18 20−1

−0.5

0

0.5

1

Hard: min sTLs s.t. eTs = 0, s ∈ {±1}n.Easy: min vTLv s.t. eTv = 0, v ∈ Rn, ∥v∥2 = n.

29

Cheeger Inequality

Relaxation gives half of Cheeger inequality

h(G) ≥ λ2 ≥h2(G)2

whereh(G) = min

{|∂A||A| : A ⊂ V, 0 < |A| ≤ 1

2 |V|}

Relation between bottlenecks and λ2.Also relevant to mixing picture from dynamics story!

30

Rayleigh Quotients

sTAssTs = mean internal degree in subgraph

sTLssTs = edges cut between V′ and V′

sTAssTDs = fraction of incident edges internal to V′

sTLssTDs = fraction of incident edges cut

sTBssTs = mean “surprising” internal degree in subgraph

sTBssTDs = mean fraction of internal degree that is surprising

31

Rayleigh Quotients and Eigenvalues

Basic connection (M spd):

xTKxxTMx stationary at x ⇐⇒ Kx = λMx

Easy despite lack of convexity.

32

Limits of Rayleigh Quotients

But small variations kill us:

maxx=0

xTAx∥x∥22

= λmax(A), but

maxx=0

xTAx∥x∥21

= 1− ω−1

where ω is the max clique size (Motzkin-Strauss).

33

Rayleigh Quotients and Eigenproblems

Decompose:

WTMW = I and WTKW = Λ = diag(λ1, . . . , λn).

For any x = 0,

xTKxxTMx =

n∑j=1

λjz2j , where z =W−1x

∥W−1x∥2.

SosTKssTMs ≈ λmax =⇒ s ≈

∑λj≈λmax

wjzj.

So look at invariant subspaces for extreme eigenvalues.

34

Story 3: Kernels and Geometry

Classical Multi-Dimensional Scaling

Data: square distances between points:

[D(2)]ij = d2ij = ∥ui − uj∥2

= ∥ui∥2 − 2⟨ui,uj⟩+ ∥uj∥2

ri ≡ ∥ui∥2

D(2) = r(2)eT − 2UUT + e(r(2))T

Double centering:

B = − 12 JD(2)J = UUT, J ≡ I− 1

neeT

Decompose B = XXT (e.g. via eigs) to get coordinates for data.Note that B looks like a kernel matrix / Gram matrix.

35

Resistance Distance

Think of resistors on each edge, consider net flow betweensource at node i and sink at node j:

d2ij = (ei − ej)L†(ei − ej)

This acts like a squared distance — find coordinates!

L = VΛVT =⇒ L† = V

0

λ−12

. . .λ−1n

VT

Truncate to get a low-dimensional embedding.

36

Leveraging Geometry

General idea: M a kernel matrix / Gram matrix over data

• Eigendecomposition: M = VΛVT

• Coordinates for i: (vij√λj)

nj=1

• Truncate to get low-dimensional embedding

Then do processing over geometry

• Graph layout (use as node coordinates)• Geometric partitioning (e.g. inertial methods)• Geometric clustering (k-means)

Other than resistance distance, useful metrics?

37

Beyond Resistance

What if we don’t believe global geometry?38

Isomap Idea

Distance matrix for points:

• Ordinary pairwise distance nearby• Graph distance far away

And then apply MDS.

39

Summary and Preview

The Three Stories

Three stories of why eigenstuff matters:

• Dynamics and diffusion• Measure and counting• Kernels and geometry

40

The Missing Link

So far, focus is the extreme eigenpairs:

• Metastable states in dynamics story• Relaxed solutions in measure/counting story• Reconstructed coordinates in geometric story

What about the interior of the spectrum?

41

The Missing Link

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

120

140

160

180

200

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

300

350

400

450

500

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−200

0

200

400

600

800

1000

1200

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

300

350

400

450

500

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

300

350

400

450

500

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

80

90

100

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

2000

4000

6000

8000

10000

12000

14000

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

500

1000

1500

2000

2500

3000

3500

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

100

200

300

400

500

600

700

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6x 10

4

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4x 10

5

42

numerical methods for data science: spectral network...

Documents