ee 378b: inference, estimation, and information processing ...cs-people.bu.edu › orecchia ›...

EE 378B: Inference, Estimation,

and Information Processing

Lecture 4

Andrea Montanari

Stanford University

April 8, 2015

Andrea Montanari (Stanford University) EE 378B: Lecture 4 April 8, 2015 1 / 53

Outline

1 A reminder

2 No, seriously, why does this work?

3 Examples


ALERT: A LOT OF (COOL) LINEAR ALGEBRA!!!!


ALERT: PERTURBATION THEORY FOR LINEAR OPERATORS!!!!


A reminder


Laplacian

Similarity matrix

Aij = similarity btw i and j (= Aji )

Degree matrix

D = diag(d) , di =X

j2[n]

Aij

(Normalized) Laplacian matrix

Ln

= I � D�1/2AD�1/2


The algorithm

Spectral Clustering

Input : Similarity matrix A 2 Rn⇥nOutput : k clusters S

1

, . . . , Sk1: Compute the Laplacian L

n

= I � D�1/2AD�1/2;2: Compute the first k eigenvectors of L

n

, u1

, . . . , uk ;3: For each i 2 {1, . . . , n}4: Let yi = (u1,i , . . . , uk,i ) 2 Rk5: Set xi = yi/kyik5: Cluster {x

1

, . . . , xn} using K-Means


Consequences

Proposition

(1) Ln

⌫ 0(2) If G connected, then 0 = �

1

(Ln

) < �2

(Ln

) · · · �n(Ln) and. . .(3) . . . u

1,i = Cpdi .


k disconnected components

[n] = S1

[ S2

[ · · · [ Sk , ASi ,Sj = 0

Ln

=

0

B

B

B

@

L(1)n

0 0 0

0 L(2)n

0 00 0 · · · 00 0 0 L(k)

n

1

C

C

C

A


Subtlety. . .

Compute first k eigenvectors.

eY = [u1

| · · · |uk ] 2 Rn⇥k

eY = YR , R 2 Rk⇥k , R⇤R = I

xi = yi/kyik = (1, 0, 0, 0, 0)R for i 2 S1xi = yi/kyik = (0, 1, 0, 0, 0)R for i 2 S1xi = yi/kyik = (0, 0, 1, 0, 0)R for i 2 S3. . .


Not too bad!


No, seriously, why does this work?


Idea

G ‘close to’ k disconnected components


Let us try to formalize

Ln

2 Rn⇥n normalized LaplacianE0

= [u1

, . . . , uk ] 2 Rn⇥k , E ⇤0

E0

= Ik⇥k , first k eigenvectors

Meta theorem

If L0n

is close to Ln

then E 00

is close to E0

.


‘Close to’

M 2 Rm⇥n

kMk2

= �max

(M)

= max�

kMuk : kuk = 1

= max�

hv ,Mui : kuk = 1, kvk = 1

(orthogonal invariant kMk = kRMSk for RR⇤ = SS⇤ = I )


Let us try to formalize, again

Two matrices A, A+ H 2 Rn⇥n symmetric.E0

,F0

2 Rn⇥k orthogonal

E0

,F0

bases of eigenspaces:

AE0

= E0

A0

,

(A+ H)F0

= F0

B0

Meta Theorem

d(E0

,F0

) somefunctionof(kHk2

)


Let us try to formalize, again

d(E0

,F0

) must depend only on the space spanned by cols of E0

,F0

dp

(E0

,F0

) ⌘ kE0

E ⇤0

� F0

F ⇤0

k2

Lemma (1)

If F = [F0

|F1

] 2 Rn⇥n with F ⇤F = I , then

dp

(E0

,F0

) = kF ⇤1

E0

k2

= kF ⇤0

E1

k2

.


Another way to look at it: Principal angles

E ⇤0

F0

= U cos⇥V ⇤ , U,V 2 O(k , k)

O(m, n) =�

Q 2 Rm⇥n : Q⇤Q = In⇥n

⇥ = diag(✓1

, . . . , ✓k)

(for instance, if k = 1, e⇤0

f0

= cos ✓)


Another way to look at it: Principal angles

E ⇤0

F0

= U cos⇥V ⇤ , U,V 2 O(k)

Lemma (2)

kE ⇤0

F1

k2

= k sin⇥k2

= max�

| sin ✓1

|, . . . , | sin ✓k |

Lemma (1)+ Lemma (2) ) dp

(E0

,F0

) = k sin⇥k2


Proof of Lemma (2)


Proof of Lemma (1)


Yet another distance

dc

(E0

,F0

) = minQ,R2O(k)

kE0

Q � F0

Rk2

= minR2O(k)

kE0

� F0

Rk2

Lemma

dc

(E0

,F0

) =�

�2 sin⇥/2�

�

2

.

Corollary

dp

(E0

,F0

) dc

(E0

,F0

) p2 d

p

(E0

,F0

) .


Proof of the Lemma


Finally, a theorem! The setting

Two matrices A, A+ H 2 Rn⇥n symmetric.

E0

,F0

2 Rn⇥k , E1

,F1

2 Rn⇥(n�k)

E = [E0

|E1

], F = [F0

|F1

] 2 Rn⇥n orthogonal

E0

,F0

bases of eigenspaces:

A = E

✓

A0

00 A

1

◆

E ⇤ ,

A+ H = F

✓

B0

00 B

1

◆

F ⇤ .


Finally, a theorem!

Theorem (Davis, Kahan sin theta theorem)

Assume eval(A0

) ✓ [a, b], eval(B1

) ✓ (�1, a� �) [ (b + �,1). Then

dp

(E0

,F0

) 1�kHk

2

.


Let us apply it to our case

Ln

, eigenvalues �1

, �2

,. . .L0n

, eigenvalues �01

, �02

,. . .

Theorem

Letting Y = [u1

| . . . |uk ], Y 0 = [u01

| . . . |u0k ], we have

dc

(Y ,Y 0) p2

�0k+1kL0

n

� Ln

k2

.


More explicitly

Ln

, eigenvalues �1

, �2

,. . .L0n

, eigenvalues �01

, �02

,. . .

Theorem

There exists Q 2 O(k) such that, letting Y = [u1

| . . . |uk ],Y 0 = [u0

1

| . . . |u0k ]Q, we have

kY � Y 0k2

p2

�0k+1kL0

n

� Ln

k2

.


Even more explicitly

Ln

, eigenvalues �1

, �2

,. . .L0n

, eigenvalues �01

, �02

,. . .

yi rows of Y

1

n

nX

i=1

kyi � y 0i k2 =1

nkY � Y 0k2F

k

nkY � Y 0k2

2

Theorem


| . . . |uk ],Y 0 = [u0

1

| . . . |u0k ]Q, we have

1

n

nX

i=1

kyi � y 0i k2 2k

n(�0k+1)2

kL0n

� Ln

k22

.

. . .�0k+1 is tricky. . .Andrea Montanari (Stanford University) EE 378B: Lecture 4 April 8, 2015 31 / 53

A usable theorem

Ln

, eigenvalues �1

, �2

,. . .

Theorem


| . . . |uk ],Y 0 = [u0

1

| . . . |u0k ]Q, we have

1

n

nX

i=1

kyi � y 0i k2 2k

n(�k+1 � kL0n

� Ln

k2

)2kL0

n

� Ln

k22

.


EXAMPLES!!!!


First example

A =

✓

0 00 a

◆

,

H =

✓

0 hh 0

◆

n = 2, k = 1, kHk2

= h,


Let’s apply the sin theta theorem

E0

= [e0

] = eigenspace with smallest eigenvalue of AF0

= [f0

] = eigenspace with smallest eigenvalue of B = A+ H

� = ???

Need to compute the eigenvalues of A+ H


Eigenvalues

A+ H =

✓

0 hh a

◆

,

�1,2 =

a

2

n

1±r

1 +4h2

a2

o

.

Can take � = a (a bit more if h > 0. . . )


Sin Theta

sin ✓ =q

1� he0

, f0

i2 kHk2�

ha.

Explicit calculation (eh = h/a, � =p

1 + 4eh2)

e0

= (1, 0)

f0

=1

(2�+ 2�2)1/2�

1 +�,�2eh�

,

hf0

, e0

i = 1 +�(2�+ 2�2)1/2

= 1� h2

2a2+ O(h4)

Not bad!!


Using our usable theorem. . .

Theorem


| . . . |uk ],Y 0 = [u0

1

| . . . |u0k ]Q, we have

1

n

nX

i=1

kyi � y 0i k2 2k

n(�k+1 � kHk2)2kHk2

2

.

Applying it to our case: k = 1, �2

= a, kHk2

= h

1

2ke

0

� f0

k2 h2

(a� h)2 ,

he0

, f0

i � 1� h2

(a� h)2 .


Slightly more interesting example

Similarity matrix

A =

0

B

B

B

B

B

B

B

B

B

B

B

B

@

p p p q q q q q qp p p q q q q q qp p p q q q q q qq q q p p p q q qq q q p p p q q qq q q p p p q q qq q q q q q p p pq q q q q q p p pq q q q q q p p p

1

C

C

C

C

C

C

C

C

C

C

C

C

A



Laplacian

I � L0n

=1

qn + (p � q)n/k

0

B

B

B

B

B

B

B

B

B

B

B

B

@


1

C

C

C

C

C

C

C

C

C

C

C

C

A



Laplacian

eL0n

=

0

B

B

B

B

B

B

B

B

B

B

B

B

@


1

C

C

C

C

C

C

C

C

C

C

C

C

A



‘Unperturbed’ Laplacian

eLn

=

0

B

B

B

B

B

B

B

B

B

B

@

p � q p � q p � q 0 0 0 0 0 0p � q p � q p � q 0 0 0 0 0 0p � q p � q p � q 0 0 0 0 0 00 0 0 p � q p � q p � q 0 0 00 0 0 p � q p � q p � q 0 0 00 0 0 p � q p � q p � q 0 0 00 0 0 0 0 0 p � q p � q p � q0 0 0 0 0 0 p � q p � q p � q0 0 0 0 0 0 p � q p � q p � q

1

C

C

C

C

C

C

C

C

C

C

A

Top eigenvectors

u1

= (1, 1, 1, 0, 0, 0, 0, 0, 0)

u2

= (0, 0, 0, 1, 1, 1, 0, 0, 0)

u3

= (0, 0, 0, 0, 0, 0, 1, 1, 1)


The perturbation

eL0n

� eLn

=

0

B

B

B

B

B

B

B

B

B

B

B

B

@

q q q q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q q q q

1

C

C

C

C

C

C

C

C

C

C

C

C

A

keLn

� eL0n

k2

= qn


Gap of the unpertutbed Laplacian

eLn

=

0

B

B

B

B

B

B

B

B

B

B

@

p � q p � q p � q 0 0 0 0 0 0p � q p � q p � q 0 0 0 0 0 0p � q p � q p � q 0 0 0 0 0 00 0 0 p � q p � q p � q 0 0 00 0 0 p � q p � q p � q 0 0 00 0 0 p � q p � q p � q 0 0 00 0 0 0 0 0 p � q p � q p � q0 0 0 0 0 0 p � q p � q p � q0 0 0 0 0 0 p � q p � q p � q

1

C

C

C

C

C

C

C

C

C

C

A

Eigenvalues (recall that we rescaled and added multiple of identity)

�1

(eLn

) = · · · = �k(eLn) = (p � q)n

k,

�k+1(eLn) = · · · = �n(eLn) = 0 ,

�1

(eLn

)� �k+1(eLn) = (p � q)n/kAndrea Montanari (Stanford University) EE 378B: Lecture 4 April 8, 2015 45 / 53

Using our usable theorem

Theorem

There exists Q 2 O(k) such that, letting Y 0 = [u01

| . . . |u0k ]Q, we have

1

n

nX

i=1

ky 0i � eCluster(i)k2 2k

n(�1

� �k+1 � kL0n

� Ln

k2

)2kL0

n

� Ln

k22

.

with e1

= (1, 0, . . . , 0), e2

= (0, 1, 0, . . . , 0),. . . .


Using our usable theorem

Theorem


| . . . |u0k ]Q, we have

1

n

nX

i=1

ky 0i � eCluster(i)k2 2k

n((p � q)(n/k)� qn)2 q2n2

2k3q2

n(p � q � qk)2

with e1

= (1, 0, . . . , 0), e2

= (0, 1, 0, . . . , 0),. . . .

Works q < p/(k + 1)� "


Better estimate of �

�1

(eLn

) = · · · = �k(eLn) = (p � q)n

k,

�k+1(eLn) = · · · = �n(eLn) = 0 ,

�1

(eL0n

) = qn + (p � q)nk,

�2

(eL0n

) = · · · = �k(eL0n

) = (p � q)nk,

�k+1(eL0n

) = · · · = �n(eL0n

) = 0 ,

Can take � = (p � q)n/kAndrea Montanari (Stanford University) EE 378B: Lecture 4 April 8, 2015 48 / 53

Better estimate of �

Theorem


| . . . |u0k ]Q, we have

1

n

nX

i=1

ky 0i � eCluster(i)k2 2k3q2

n(p � q)2

with e1

= (1, 0, . . . , 0), e2

= (0, 1, 0, . . . , 0),. . . .

Works for

q pn

1� Ck3/2

n1/2

o


Well, this was not that impressive. . . :-(


A reasonably interesting example

Aij =

⇢

1 with probability Pij0 otherwise.


Where

P =

0

B

B

B

B

B

B

B

B

B

B

B

B

@


1

C

C

C

C

C

C

C

C

C

C

C

C

A

Does this behave as before?


Much more impressive!

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

n = 150, k = 3 p = 0.6, q = 0.4Andrea Montanari (Stanford University) EE 378B: Lecture 4 April 8, 2015 53 / 53

EE 378B: Inference, Estimation,and Information Processing

Lecture 6

Andrea Montanari

Stanford University

April 15, 2015


Outline

1 The planted model. . .

2 . . . and its analysis


The planted model


Theoretical Computer Science: Planted partition model.

Statistics: Stochastic block model.


Probability matrix

P =

0

B

B

B

B

B

B

B

B

B

B

B

B

@


1

C

C

C

C

C

C

C

C

C

C

C

C

A

k clusters of size n/k 0 q < p 1.


Similarity

Aij =

⇢

1 with probability Pij0 otherwise.

Think of this as the adjacency matrix of a graph G .


Here is how it looks like

0.00.2

0.40.6

0.81.0

0.0 0.2 0.4 0.6 0.8 1.0

n = 150, k = 3 p = 0.6, q = 0.4


Can you see the clusters?

0.00.2

0.40.6

0.81.0

0.0 0.2 0.4 0.6 0.8 1.0

n = 150, k = 3 p = 0.6, q = 0.4


Permuting randomly the items

0.00.2

0.40.6

0.81.0

0.0 0.2 0.4 0.6 0.8 1.0

n = 150, k = 3 p = 0.6, q = 0.4


Will we succeed?

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150

020

4060

Index

eM$val

n = 150, k = 3 p = 0.6, q = 0.4


Will we succeed?

−0.1 0.0 0.1 0.2

−0.2

−0.1

0.0

0.1

0.2

eM$vec[, 2]

eM$v

ec[,

3]

1

11

1

1

1

11

1

1

11 11

11

1

1

1

1

1

111

11

11

11

1

1

1

11

1

11 1

11

11

1

1

1

1

1

1

1

2

2

22

2

2

22

2

2

2

2

2

2

2

22

222

2

2

222

2

2

22

22

22

2 2

2

2

2

2

2

2

2

2

2

2

22 2

2

2

3 3

33

3

3

33

33

3 3

33

3

33

3

3

33

3

3 33

3

33

3

3

3

3

33

3

33

3

3

3 33 3

3

3

33

3

3

3

n = 150, k = 3 p = 0.6, q = 0.4


Slightly di↵erent p and q

0.00.2

0.40.6

0.81.0

0.0 0.2 0.4 0.6 0.8 1.0

n = 150, k = 3 p = 0.65, q = 0.35


Will we succeed?

●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150

020

4060

Index

eM$val

n = 150, k = 3 p = 0.65, q = 0.35


Will we succeed?

−0.10 −0.05 0.00 0.05 0.10 0.15

−0.1

5−0

.10

−0.0

50.

000.

050.

100.

15

eM$vec[, 2]

eM$v

ec[,

3]

1

12

11

3

1

3

3

11

2

3

3

2 2

2

1

12

1

21

1

12

1

1

1

1

3

1

3

1

2

2

1

1

11

1

3

2

21

2

2

3

3

2

2

2

3

1

1

1

33

1

2

33

33

3

3

1

2

3

2

3

1

2 2 2

1

3

33

2

1

3

2

2

3 3

2

22

12

1

3

22

3

2

1

2

3

1

3

1

1

3

3

2

3

3

1

3

3

1

3

2

2

3

2

1

33

1

2

2

3

3

2

3

2

2

1

2

2

3

3

1

3

2

1

2

3

1

3

2

1

2

1

2

3

1

n = 150, k = 3 p = 0.65, q = 0.35


. . . and its analysis


The question

p =1

2(1 + ") , q =

1

2(1� ")

Tradeo↵ n vs "???


I cheated a little bit!!!

Largest eigenvalues/eigenvectors of A (instead of Laplacian)

I � L = D�1/2AD�1/2

D = diag(d1

, d2

, . . . , dn)

di =X

j2[n]

Aij


Plot di vs i

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●

●

0 50 100 150

020

4060

80100

Index

sum


Why?

di =n

X

j=1

Aij ⇡n

X

j=1

Pij =n

k(p � q) + nq

ThereforeD ⇡ const. · I


Strategy

Need to understand the top k eigenvectors of A

IdeaA = P + X X = A� P , E{X} = 0

X is a small perturbation


Eigenvectors/eigenvalues of P

P =

0

B

B

B

B

B

B

B

B

B

B

B

B

@


1

C

C

C

C

C

C

C

C

C

C

C

C

A


Eigenvectors/eigenvalues of P

u1

= C (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)

u2

= C 0(k � 1, k � 1, k � 1,�1,�1,�1,�1,�1,�1,�1,�1,�1)· · ·

uk = C0(�1,�1,�1,�1,�1,�1,�1,�1,�1, k � 1, k � 1, k � 1)

�1

= qn + (p � q)nk,

�2

= · · · = �k = (p � q)n

k�k+1 = · · · = �n = 0


Applying perturbation theory

y1

, . . . , yn 2 Rk embedding from spectral method (up to rotation)e1

, . . . , ek 2 Rk standard basis

1

n

nX

i=1

kyi � ecluster(i)k2

2k

n((p � q)(n/k)� kXk2

)2kXk2

2

.


Applying perturbation theory

y1



1

n

nX

i=1


2k

n(n"/k � kXk)2 kXk2

2

.


The single most important fact in random matrix theory

Claim

kXk2

pn


Applying perturbation theoryy1



1

n

nX

i=1


2k

n(n"/k � kXk)2 kXk2

2

2k(n"/k �

pn)2

k5n

" � 10kpn


Proving the claim

Claim

kXk2

pn

IID entries (i j)

Xij =

⇢

1� Pij with prob Pij�Pij otherwise

E{Xij} = 0, |Xij | 1


A first attempt

E{kXk22

} E{kXk2F} =n

X

i ,j=1

E{X 2ij } n2

P{kXk2

� 10n} 110

This is A BAD BOUND!


A hammer

Theorem (Matrix Bernstein inequality)

Z1

, . . . ,Zm 2 Rn⇥n independent symmetric, EZi = 0 kZik2 R ,kP

i E{Z 2i }k �2. Let X = Z1 + Z2 + · · ·+ Zm. Then

P{kXk � t} n expn

� t2

6(Rt + �2)

o


In our case

Zij = Xij(eie⇤j + eje

⇤i )

kZijk2 1

Z 2ij = X2

ij (eie⇤i + eje

⇤j )

X

ijE{Z 2ij } = diag(b) , bi =

X

j

E{X 2ij } n

�

�

�

X

ijE{Z 2ij }

�

�

�

2

2n


Substituting

P{kXk � t} n expn

� t2

6(Rt + �2)

o

n expn

� t2

6(t + n)

o

Therefore

P{kXk � 10p

n log n} n expn

� 100n log n6(n + 10

pn)

o

n expn

� 100n log n20n

o

ne�5 log n = n�4

This is an OK bound


A reminderNo, seriously, why does this work?Examples

ee 378b: inference, estimation, and information processing ...cs-people.bu.edu › orecchia ›...

Documents