analysis of social media mld 10-802, lti 11-772 william cohen 2-15-11

12
Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Upload: terence-sanders

Post on 21-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Analysis of Social MediaMLD 10-802, LTI 11-772

William Cohen2-15-11

Page 2: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

The “force” on nodes in a graph

• Suppose every node has a value (IQ, income,..) y(i)– Each node i has value yi … • and neighbors N(i), degree di

– If i,j connected then j exerts a force -K*[yi-yj] on i– Total:

– Matrix notation: F = -K(D-A)y - the Laplacian– Interesting (?) goal: set y so (D-A)y = c*y– Picture: neighbors pull i up or down, but net force

doesn’t change relative positions of nodes

)()(

)(iNjjii

iNjjii yydKyyKF

Page 3: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Spectral Clustering: Graph = Matrix

How do I pick y to be an eigenvector for a block-stochastic matrix?

Page 4: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”

M[Shi & Meila, 2002]

e2

e3

-0.4 -0.2 0 0.2

-0.4

-0.2

0.0

0.2

0.4

xx x xx x

y yyy

y

xx xxxx

zzz zz z

zz zz z

e1

e2

Page 5: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11
Page 6: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

jijiji

T

jijiji

jijij

jiiij

jijiji

jj

iij

ii

jij

jijiji

iii

jijiji

iii

TTT

yyaAD

yyayaya

yyayaya

yyayd

yyaydADAD

,

2,

,,

,

2

,

2

,,

22

,,

2

,,

2

)(2

1)(

22

1

22

1

222

1

)(

yy

yyyyyy

Another way the Laplacian comes up: it defines a cost formula for y where y assigned nodes to + or – classes so as to keep connected nodes in the same class.• Turns out: to minimize yT X y / (yTy) find smallest eigenvector of X• But: this will not be +1/-1, so it’s a “relaxed” solution

Page 7: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Some more terms

• If A is an adjacency matrix (maybe weighted) and D is a (diagonal) matrix giving the degree of each node– Then D-A is the (unnormalized) Laplacian– W=AD-1 is a probabilistic adjacency matrix– I-W is the (normalized or random-walk) Laplacian– etc….

• The largest eigenvectors of W correspond to the smallest eigenvectors of I-W– So sometimes people talk about “bottom eigenvectors

of the Laplacian”

Page 8: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

A

W

A

W

K-nn graph(easy)

Fully connected graph,weighted by distance

Page 9: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”

M[Shi & Meila, 2002]

e2

e3

-0.4 -0.2 0 0.2

-0.4

-0.2

0.0

0.2

0.4

xx x xx x

y yyy

y

xx xxxx

zzz zz z

zz zz z

e1

e2

Page 10: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”

M

eigenvaluer with eigenvectoan is : vvvW

If Wis connected but roughly block diagonal with k blocks then• the top eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks

Page 11: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”

M

eigenvaluer with eigenvectoan is : vvvW

If W is connected but roughly block diagonal with k blocks then• the “top” eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks

Spectral clustering:• Find the top k+1 eigenvectors v1,…,vk+1

• Discard the “top” one• Replace every node a with k-dimensional vector xa = <v2(a),…,vk+1 (a) >

• Cluster with k-means

Page 12: Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 2-15-11

Experimental results: best-case assignment of class labels to clusters

Eigenvectors of W Eigenvecs of variant of W