analysis of social media mld 10-802, lti 11-772 william cohen 2-15-11

Analysis of Social MediaMLD 10-802, LTI 11-772

William Cohen2-15-11

The “force” on nodes in a graph

• Suppose every node has a value (IQ, income,..) y(i)– Each node i has value yi … • and neighbors N(i), degree di

– If i,j connected then j exerts a force -K*[yi-yj] on i– Total:

– Matrix notation: F = -K(D-A)y - the Laplacian– Interesting (?) goal: set y so (D-A)y = c*y– Picture: neighbors pull i up or down, but net force

doesn’t change relative positions of nodes

)()(

)(iNjjii

iNjjii yydKyyKF

Spectral Clustering: Graph = Matrix

How do I pick y to be an eigenvector for a block-stochastic matrix?

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”

M[Shi & Meila, 2002]

e2

e3

-0.4 -0.2 0 0.2

-0.4

-0.2

0.0

0.2

0.4

xx x xx x

y yyy

y

xx xxxx

zzz zz z

zz zz z

e1

e2

jijiji

T

jijiji

jijij

jiiij

jijiji

jj

iij

ii

jij

jijiji

iii

jijiji

iii

TTT

yyaAD

yyayaya

yyayaya

yyayd

yyaydADAD

,

2,

,,

,

2

,

2

,,

22

,,

2

,,

2

)(2

1)(

22

1

22

1

222

1

)(

yy

yyyyyy

Another way the Laplacian comes up: it defines a cost formula for y where y assigned nodes to + or – classes so as to keep connected nodes in the same class.• Turns out: to minimize yT X y / (yTy) find smallest eigenvector of X• But: this will not be +1/-1, so it’s a “relaxed” solution

Some more terms

• If A is an adjacency matrix (maybe weighted) and D is a (diagonal) matrix giving the degree of each node– Then D-A is the (unnormalized) Laplacian– W=AD-1 is a probabilistic adjacency matrix– I-W is the (normalized or random-walk) Laplacian– etc….

• The largest eigenvectors of W correspond to the smallest eigenvectors of I-W– So sometimes people talk about “bottom eigenvectors

of the Laplacian”

A

W

A

W

K-nn graph(easy)

Fully connected graph,weighted by distance


M[Shi & Meila, 2002]

e2

e3

-0.4 -0.2 0 0.2

-0.4

-0.2

0.0

0.2

0.4

xx x xx x

y yyy

y

xx xxxx

zzz zz z

zz zz z

e1

e2


M

eigenvaluer with eigenvectoan is : vvvW

If Wis connected but roughly block diagonal with k blocks then• the top eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks


M

eigenvaluer with eigenvectoan is : vvvW

If W is connected but roughly block diagonal with k blocks then• the “top” eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks

Spectral clustering:• Find the top k+1 eigenvectors v1,…,vk+1

• Discard the “top” one• Replace every node a with k-dimensional vector xa = <v2(a),…,vk+1 (a) >

• Cluster with k-means

Experimental results: best-case assignment of class labels to clusters

Eigenvectors of W Eigenvecs of variant of W

analysis of social media mld 10-802, lti 11-772 william cohen 2-15-11

Documents

eigenvectors v1

smallest eigenvectors

v2 propogates weights

largest eigenvectors

distancespectral clustering

diagonal matrix

constant vector

piecewise constant