lecture 13: concentration of sums of independent random...

18
Lecture 13: Concentration of sums of independent random matrices May 20 - 22, 2020 Independent Random Matrices DAMC Lecture 13 May 20 - 22, 2020 1 / 18

Upload: others

Post on 24-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Lecture 13 Concentration of sums of independentrandom matrices

May 20 - 22 2020

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 1 18

1 Matrix calculus

Let us focus on ntimes n symmetric matrices

A ≽ B (of course B ≼ A) means AminusB is positive semidefinite

This defines a partial order on the set of ntimes n symmetric matrices

983042A9830422 is the smallest non-negative number M such that

983042Ax9830422 983249 M983042x9830422 for all x isin Rn

Functions of matrices

Let f R 983041rarr R be a function and X be an ntimes n symmetric matrixWe define

f(X) =983131n

i=1f(λi)uiu

Ti

where λi are the eigenvalues of X and ui are the correspondingeigenvectors satisfying

X =983131n

i=1λiuiu

Ti

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 2 18

2 Matrix Bernsteinrsquos inequality

Theorem 1 (Matrix Bernsteinrsquos inequality)

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that

983042Xi9830422 le K

almost surely for all i Then for every t ge 0 we have

P

983083983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

983245 t

983084983249 2n middot exp

983061minus t22

σ2 +Kt3

983062

Here

σ2 =

983056983056983056983056983056

N983131

i=1

EX2i

9830569830569830569830569830562

is the norm of the ldquomatrix variancerdquo of the sum

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 3 18

The scalar case where n = 1 is the classical Bernsteins inequalityfor random variables

A remarkable feature of matrix Bernsteinrsquos inequality which makesit especially powerful is that it does not require any independenceof the entries (or the rows or columns) of Xi all is needed is thatthe random matrices Xi be independent from each other

The proof of Theorem 1 is based on bounding the momentgenerating function (MGF) E exp(λS) of the sum S =

983123Ni=1Xi

Golden-Thompson inequality For any symmetric X and Y

tr983043eX+Y

983044983249 tr

983043eXeY

983044

Liebrsquos lemma For given ntimes n symmetric H the function

f(X) = minustr exp(H+ logX)

is convex on the space of ntimes n SPD matrices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 4 18

Jensenrsquos inequality

For any convex function f and a random matrix X it holds

f(EX) le Ef(X)

Liebrsquos inequality for random matrices

Let H be a fixed ntimes n symmetric matrix and Z be an ntimes nsymmetric random matrix Then

Etr exp(H+Z) 983249 tr exp983043H+ logEeZ

983044

MGF of a sum of independent random matrices

Let X1 XN be independent ntimes n symmetric randommatrices Then the sum S =

983123Ni=1Xi satisfies

Etr exp(λS) le tr exp

983077N983131

i=1

logEeλXi

983078

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 5 18

Lemma 2

Let X be an ntimes n symmetric random matrix Assume that

EX = 0

and983042X9830422 le K

almost surely Then for all

0 lt λ lt3

K

we haveE exp(λX) ≼ exp

983043g(λ)EX2

983044

where

g(λ) =λ22

1minus λK3

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 6 18

Proof First check for 0 lt λ lt 3K and |x| le K it holds

eλx le 1 + λx+ g(λ)x2

Then extend it to matrix version for

0 lt λ lt 3K and 983042X9830422 le K

it holdseλX ≼ I+ λX + g(λ)X2

Finally take the expectation and recall that

EX = 0

to obtain

E exp(λX) ≼ I+ g(λ)X2 ≼ exp983043g(λ)EX2

983044

where the last inequality is from the matrix version of the scalarinequality

1 + z le ez for all z isin RIndependent Random Matrices DAMC Lecture 13 May 20 - 22 2020 7 18

Proof of Matrix Bernsteinrsquos inequality (Theorem 1)

Pλmax(S) ge t = Pexp(λ middot λmax(S)) ge exp(λt)le eminusλtEeλmiddotλmax(S)

= eminusλtEλmax(eλS) (check)

le eminusλtEtr(eλS)

le eminusλttr exp983147983123N

i=1 logEeλXi

983148

le eminusλttr exp983045g(λ)Z

983046

where Z =983123N

i=1 EX2i Consider a special λ = t(σ2 +Kt3) With

this value of λ we conclude

P λmax(S) 983245 t 983249 n middot exp983061minus t22

σ2 +Kt3

983062

Next repeat the argument for minusS to reinstate the absolute value

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 8 18

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 2: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

1 Matrix calculus

Let us focus on ntimes n symmetric matrices

A ≽ B (of course B ≼ A) means AminusB is positive semidefinite

This defines a partial order on the set of ntimes n symmetric matrices

983042A9830422 is the smallest non-negative number M such that

983042Ax9830422 983249 M983042x9830422 for all x isin Rn

Functions of matrices

Let f R 983041rarr R be a function and X be an ntimes n symmetric matrixWe define

f(X) =983131n

i=1f(λi)uiu

Ti

where λi are the eigenvalues of X and ui are the correspondingeigenvectors satisfying

X =983131n

i=1λiuiu

Ti

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 2 18

2 Matrix Bernsteinrsquos inequality

Theorem 1 (Matrix Bernsteinrsquos inequality)

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that

983042Xi9830422 le K

almost surely for all i Then for every t ge 0 we have

P

983083983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

983245 t

983084983249 2n middot exp

983061minus t22

σ2 +Kt3

983062

Here

σ2 =

983056983056983056983056983056

N983131

i=1

EX2i

9830569830569830569830569830562

is the norm of the ldquomatrix variancerdquo of the sum

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 3 18

The scalar case where n = 1 is the classical Bernsteins inequalityfor random variables

A remarkable feature of matrix Bernsteinrsquos inequality which makesit especially powerful is that it does not require any independenceof the entries (or the rows or columns) of Xi all is needed is thatthe random matrices Xi be independent from each other

The proof of Theorem 1 is based on bounding the momentgenerating function (MGF) E exp(λS) of the sum S =

983123Ni=1Xi

Golden-Thompson inequality For any symmetric X and Y

tr983043eX+Y

983044983249 tr

983043eXeY

983044

Liebrsquos lemma For given ntimes n symmetric H the function

f(X) = minustr exp(H+ logX)

is convex on the space of ntimes n SPD matrices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 4 18

Jensenrsquos inequality

For any convex function f and a random matrix X it holds

f(EX) le Ef(X)

Liebrsquos inequality for random matrices

Let H be a fixed ntimes n symmetric matrix and Z be an ntimes nsymmetric random matrix Then

Etr exp(H+Z) 983249 tr exp983043H+ logEeZ

983044

MGF of a sum of independent random matrices

Let X1 XN be independent ntimes n symmetric randommatrices Then the sum S =

983123Ni=1Xi satisfies

Etr exp(λS) le tr exp

983077N983131

i=1

logEeλXi

983078

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 5 18

Lemma 2

Let X be an ntimes n symmetric random matrix Assume that

EX = 0

and983042X9830422 le K

almost surely Then for all

0 lt λ lt3

K

we haveE exp(λX) ≼ exp

983043g(λ)EX2

983044

where

g(λ) =λ22

1minus λK3

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 6 18

Proof First check for 0 lt λ lt 3K and |x| le K it holds

eλx le 1 + λx+ g(λ)x2

Then extend it to matrix version for

0 lt λ lt 3K and 983042X9830422 le K

it holdseλX ≼ I+ λX + g(λ)X2

Finally take the expectation and recall that

EX = 0

to obtain

E exp(λX) ≼ I+ g(λ)X2 ≼ exp983043g(λ)EX2

983044

where the last inequality is from the matrix version of the scalarinequality

1 + z le ez for all z isin RIndependent Random Matrices DAMC Lecture 13 May 20 - 22 2020 7 18

Proof of Matrix Bernsteinrsquos inequality (Theorem 1)

Pλmax(S) ge t = Pexp(λ middot λmax(S)) ge exp(λt)le eminusλtEeλmiddotλmax(S)

= eminusλtEλmax(eλS) (check)

le eminusλtEtr(eλS)

le eminusλttr exp983147983123N

i=1 logEeλXi

983148

le eminusλttr exp983045g(λ)Z

983046

where Z =983123N

i=1 EX2i Consider a special λ = t(σ2 +Kt3) With

this value of λ we conclude

P λmax(S) 983245 t 983249 n middot exp983061minus t22

σ2 +Kt3

983062

Next repeat the argument for minusS to reinstate the absolute value

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 8 18

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 3: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

2 Matrix Bernsteinrsquos inequality

Theorem 1 (Matrix Bernsteinrsquos inequality)

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that

983042Xi9830422 le K

almost surely for all i Then for every t ge 0 we have

P

983083983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

983245 t

983084983249 2n middot exp

983061minus t22

σ2 +Kt3

983062

Here

σ2 =

983056983056983056983056983056

N983131

i=1

EX2i

9830569830569830569830569830562

is the norm of the ldquomatrix variancerdquo of the sum

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 3 18

The scalar case where n = 1 is the classical Bernsteins inequalityfor random variables

A remarkable feature of matrix Bernsteinrsquos inequality which makesit especially powerful is that it does not require any independenceof the entries (or the rows or columns) of Xi all is needed is thatthe random matrices Xi be independent from each other

The proof of Theorem 1 is based on bounding the momentgenerating function (MGF) E exp(λS) of the sum S =

983123Ni=1Xi

Golden-Thompson inequality For any symmetric X and Y

tr983043eX+Y

983044983249 tr

983043eXeY

983044

Liebrsquos lemma For given ntimes n symmetric H the function

f(X) = minustr exp(H+ logX)

is convex on the space of ntimes n SPD matrices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 4 18

Jensenrsquos inequality

For any convex function f and a random matrix X it holds

f(EX) le Ef(X)

Liebrsquos inequality for random matrices

Let H be a fixed ntimes n symmetric matrix and Z be an ntimes nsymmetric random matrix Then

Etr exp(H+Z) 983249 tr exp983043H+ logEeZ

983044

MGF of a sum of independent random matrices

Let X1 XN be independent ntimes n symmetric randommatrices Then the sum S =

983123Ni=1Xi satisfies

Etr exp(λS) le tr exp

983077N983131

i=1

logEeλXi

983078

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 5 18

Lemma 2

Let X be an ntimes n symmetric random matrix Assume that

EX = 0

and983042X9830422 le K

almost surely Then for all

0 lt λ lt3

K

we haveE exp(λX) ≼ exp

983043g(λ)EX2

983044

where

g(λ) =λ22

1minus λK3

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 6 18

Proof First check for 0 lt λ lt 3K and |x| le K it holds

eλx le 1 + λx+ g(λ)x2

Then extend it to matrix version for

0 lt λ lt 3K and 983042X9830422 le K

it holdseλX ≼ I+ λX + g(λ)X2

Finally take the expectation and recall that

EX = 0

to obtain

E exp(λX) ≼ I+ g(λ)X2 ≼ exp983043g(λ)EX2

983044

where the last inequality is from the matrix version of the scalarinequality

1 + z le ez for all z isin RIndependent Random Matrices DAMC Lecture 13 May 20 - 22 2020 7 18

Proof of Matrix Bernsteinrsquos inequality (Theorem 1)

Pλmax(S) ge t = Pexp(λ middot λmax(S)) ge exp(λt)le eminusλtEeλmiddotλmax(S)

= eminusλtEλmax(eλS) (check)

le eminusλtEtr(eλS)

le eminusλttr exp983147983123N

i=1 logEeλXi

983148

le eminusλttr exp983045g(λ)Z

983046

where Z =983123N

i=1 EX2i Consider a special λ = t(σ2 +Kt3) With

this value of λ we conclude

P λmax(S) 983245 t 983249 n middot exp983061minus t22

σ2 +Kt3

983062

Next repeat the argument for minusS to reinstate the absolute value

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 8 18

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 4: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

The scalar case where n = 1 is the classical Bernsteins inequalityfor random variables

A remarkable feature of matrix Bernsteinrsquos inequality which makesit especially powerful is that it does not require any independenceof the entries (or the rows or columns) of Xi all is needed is thatthe random matrices Xi be independent from each other

The proof of Theorem 1 is based on bounding the momentgenerating function (MGF) E exp(λS) of the sum S =

983123Ni=1Xi

Golden-Thompson inequality For any symmetric X and Y

tr983043eX+Y

983044983249 tr

983043eXeY

983044

Liebrsquos lemma For given ntimes n symmetric H the function

f(X) = minustr exp(H+ logX)

is convex on the space of ntimes n SPD matrices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 4 18

Jensenrsquos inequality

For any convex function f and a random matrix X it holds

f(EX) le Ef(X)

Liebrsquos inequality for random matrices

Let H be a fixed ntimes n symmetric matrix and Z be an ntimes nsymmetric random matrix Then

Etr exp(H+Z) 983249 tr exp983043H+ logEeZ

983044

MGF of a sum of independent random matrices

Let X1 XN be independent ntimes n symmetric randommatrices Then the sum S =

983123Ni=1Xi satisfies

Etr exp(λS) le tr exp

983077N983131

i=1

logEeλXi

983078

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 5 18

Lemma 2

Let X be an ntimes n symmetric random matrix Assume that

EX = 0

and983042X9830422 le K

almost surely Then for all

0 lt λ lt3

K

we haveE exp(λX) ≼ exp

983043g(λ)EX2

983044

where

g(λ) =λ22

1minus λK3

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 6 18

Proof First check for 0 lt λ lt 3K and |x| le K it holds

eλx le 1 + λx+ g(λ)x2

Then extend it to matrix version for

0 lt λ lt 3K and 983042X9830422 le K

it holdseλX ≼ I+ λX + g(λ)X2

Finally take the expectation and recall that

EX = 0

to obtain

E exp(λX) ≼ I+ g(λ)X2 ≼ exp983043g(λ)EX2

983044

where the last inequality is from the matrix version of the scalarinequality

1 + z le ez for all z isin RIndependent Random Matrices DAMC Lecture 13 May 20 - 22 2020 7 18

Proof of Matrix Bernsteinrsquos inequality (Theorem 1)

Pλmax(S) ge t = Pexp(λ middot λmax(S)) ge exp(λt)le eminusλtEeλmiddotλmax(S)

= eminusλtEλmax(eλS) (check)

le eminusλtEtr(eλS)

le eminusλttr exp983147983123N

i=1 logEeλXi

983148

le eminusλttr exp983045g(λ)Z

983046

where Z =983123N

i=1 EX2i Consider a special λ = t(σ2 +Kt3) With

this value of λ we conclude

P λmax(S) 983245 t 983249 n middot exp983061minus t22

σ2 +Kt3

983062

Next repeat the argument for minusS to reinstate the absolute value

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 8 18

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 5: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Jensenrsquos inequality

For any convex function f and a random matrix X it holds

f(EX) le Ef(X)

Liebrsquos inequality for random matrices

Let H be a fixed ntimes n symmetric matrix and Z be an ntimes nsymmetric random matrix Then

Etr exp(H+Z) 983249 tr exp983043H+ logEeZ

983044

MGF of a sum of independent random matrices

Let X1 XN be independent ntimes n symmetric randommatrices Then the sum S =

983123Ni=1Xi satisfies

Etr exp(λS) le tr exp

983077N983131

i=1

logEeλXi

983078

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 5 18

Lemma 2

Let X be an ntimes n symmetric random matrix Assume that

EX = 0

and983042X9830422 le K

almost surely Then for all

0 lt λ lt3

K

we haveE exp(λX) ≼ exp

983043g(λ)EX2

983044

where

g(λ) =λ22

1minus λK3

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 6 18

Proof First check for 0 lt λ lt 3K and |x| le K it holds

eλx le 1 + λx+ g(λ)x2

Then extend it to matrix version for

0 lt λ lt 3K and 983042X9830422 le K

it holdseλX ≼ I+ λX + g(λ)X2

Finally take the expectation and recall that

EX = 0

to obtain

E exp(λX) ≼ I+ g(λ)X2 ≼ exp983043g(λ)EX2

983044

where the last inequality is from the matrix version of the scalarinequality

1 + z le ez for all z isin RIndependent Random Matrices DAMC Lecture 13 May 20 - 22 2020 7 18

Proof of Matrix Bernsteinrsquos inequality (Theorem 1)

Pλmax(S) ge t = Pexp(λ middot λmax(S)) ge exp(λt)le eminusλtEeλmiddotλmax(S)

= eminusλtEλmax(eλS) (check)

le eminusλtEtr(eλS)

le eminusλttr exp983147983123N

i=1 logEeλXi

983148

le eminusλttr exp983045g(λ)Z

983046

where Z =983123N

i=1 EX2i Consider a special λ = t(σ2 +Kt3) With

this value of λ we conclude

P λmax(S) 983245 t 983249 n middot exp983061minus t22

σ2 +Kt3

983062

Next repeat the argument for minusS to reinstate the absolute value

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 8 18

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 6: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Lemma 2

Let X be an ntimes n symmetric random matrix Assume that

EX = 0

and983042X9830422 le K

almost surely Then for all

0 lt λ lt3

K

we haveE exp(λX) ≼ exp

983043g(λ)EX2

983044

where

g(λ) =λ22

1minus λK3

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 6 18

Proof First check for 0 lt λ lt 3K and |x| le K it holds

eλx le 1 + λx+ g(λ)x2

Then extend it to matrix version for

0 lt λ lt 3K and 983042X9830422 le K

it holdseλX ≼ I+ λX + g(λ)X2

Finally take the expectation and recall that

EX = 0

to obtain

E exp(λX) ≼ I+ g(λ)X2 ≼ exp983043g(λ)EX2

983044

where the last inequality is from the matrix version of the scalarinequality

1 + z le ez for all z isin RIndependent Random Matrices DAMC Lecture 13 May 20 - 22 2020 7 18

Proof of Matrix Bernsteinrsquos inequality (Theorem 1)

Pλmax(S) ge t = Pexp(λ middot λmax(S)) ge exp(λt)le eminusλtEeλmiddotλmax(S)

= eminusλtEλmax(eλS) (check)

le eminusλtEtr(eλS)

le eminusλttr exp983147983123N

i=1 logEeλXi

983148

le eminusλttr exp983045g(λ)Z

983046

where Z =983123N

i=1 EX2i Consider a special λ = t(σ2 +Kt3) With

this value of λ we conclude

P λmax(S) 983245 t 983249 n middot exp983061minus t22

σ2 +Kt3

983062

Next repeat the argument for minusS to reinstate the absolute value

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 8 18

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 7: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Proof First check for 0 lt λ lt 3K and |x| le K it holds

eλx le 1 + λx+ g(λ)x2

Then extend it to matrix version for

0 lt λ lt 3K and 983042X9830422 le K

it holdseλX ≼ I+ λX + g(λ)X2

Finally take the expectation and recall that

EX = 0

to obtain

E exp(λX) ≼ I+ g(λ)X2 ≼ exp983043g(λ)EX2

983044

where the last inequality is from the matrix version of the scalarinequality

1 + z le ez for all z isin RIndependent Random Matrices DAMC Lecture 13 May 20 - 22 2020 7 18

Proof of Matrix Bernsteinrsquos inequality (Theorem 1)

Pλmax(S) ge t = Pexp(λ middot λmax(S)) ge exp(λt)le eminusλtEeλmiddotλmax(S)

= eminusλtEλmax(eλS) (check)

le eminusλtEtr(eλS)

le eminusλttr exp983147983123N

i=1 logEeλXi

983148

le eminusλttr exp983045g(λ)Z

983046

where Z =983123N

i=1 EX2i Consider a special λ = t(σ2 +Kt3) With

this value of λ we conclude

P λmax(S) 983245 t 983249 n middot exp983061minus t22

σ2 +Kt3

983062

Next repeat the argument for minusS to reinstate the absolute value

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 8 18

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 8: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Proof of Matrix Bernsteinrsquos inequality (Theorem 1)

Pλmax(S) ge t = Pexp(λ middot λmax(S)) ge exp(λt)le eminusλtEeλmiddotλmax(S)

= eminusλtEλmax(eλS) (check)

le eminusλtEtr(eλS)

le eminusλttr exp983147983123N

i=1 logEeλXi

983148

le eminusλttr exp983045g(λ)Z

983046

where Z =983123N

i=1 EX2i Consider a special λ = t(σ2 +Kt3) With

this value of λ we conclude

P λmax(S) 983245 t 983249 n middot exp983061minus t22

σ2 +Kt3

983062

Next repeat the argument for minusS to reinstate the absolute value

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 8 18

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 9: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Corollary 3

Let X1 XN be independent mean zero ntimes n symmetric randommatrices such that 983042Xi9830422 le K almost surely for all i Then

E

983056983056983056983056983056

N983131

i=1

Xi

9830569830569830569830569830562

≲ σ983155

log n+K log n

where σ =983056983056983056983123N

i=1 EX2i

98305698305698305612

2

Proof The link from tail bounds to expectation is provided by thebasic identity

EZ =

983133 infin

0P Z gt t dt

which is valid for any non-negative random variable Z Integrating thetail bound given by matrix Bernsteinrsquos inequality you will arrive at theexpectation bound we claimed

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 9 18

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 10: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

3 Community recovery in networks

A network can be mathematically represented by a graph a set ofn vertices with edges connecting some of them

For simplicity we will consider undirected graphs where the edgesdo not have arrows

Real world networks often tend to have clusters or communities -subsets of vertices that are connected by unusually many edges(For example a friendship network where communities formaround some common interests)

An important problem in data science is to recover communitiesfrom a given network

Spectral clustering one of the simplest methods for communityrecovery

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 10 18

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 11: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

31 Stochastic block model

Partition a set of n vertices into two subsets (ldquocommunitiesrdquo) withn2 vertices each and connect every pair of vertices independentlywith probability p if they belong to the same community andq lt p if not The resulting random graph is said to follow thestochastic block model G(n p q)

A network according to

stochastic block model

G(n p q)

n = 200

p = 120

q = 1200

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 11 18

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 12: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

32 Spectral clustering

Suppose we are shown one instance of a random graph generatedaccording to a stochastic block model G(n p q) How can we findwhich vertices belong to which community

The spectral clustering algorithm we are going to explain will doprecisely this It will be based on the spectrum of the adjacencymatrix A of the graph which is the ntimes n symmetric matrix whoseentries Aij equal 1 if the vertices i and j are connected by anedge and 0 otherwise

The adjacency matrix A is a random matrix Let us compute itsexpectation first This is easy since the entires of A are Bernoullirandom variables If i and j belong to the same community thenEAij = p and otherwise EAij = q

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 12 18

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 13: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Thus EA has block structure for example if n = 4 then EA lookslike this

EA =

983093

983097983097983095

p p q qp p q q

q q p pq q p p

983094

983098983098983096

For illustration purposes we grouped the vertices from eachcommunity together

EA has rank 2 and the non-zero eigenvalues and thecorresponding eigenvectors are

λ1 =

983061p+ q

2

983062n v1 =

983093

983097983097983095

11

11

983094

983098983098983096 λ2 =

983061pminus q

2

983062n v2 =

983093

983097983097983095

11

minus1minus1

983094

983098983098983096

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 13 18

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 14: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

The eigenvalues and eigenvectors of EA tell us a lot about thecommunity structure of the underlying graph

The first (larger) eigenvalue

d =

983061p+ q

2

983062n

is the expected degree of any vertex of the graph

The second eigenvalue 983061pminus q

2

983062n

tells us whether there is any community structure at all (whichhappens when p ∕= q and thus λ2 ∕= 0)

The signs of the components of the second eigenvector v2 tell usexactly how to separate the vertices into the two communities

The problem is that we do not know EA Instead we know theadjacency matrix A So is it true that A asymp EA

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 14 18

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 15: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Theorem 4 (Concentration of the stochastic block model)

Let A be the adjacency matrix of a G(n p q) random graph Then

E983042Aminus EA9830422 ≲983155

d log n+ log n

Here d = (p+ q)n2 is the expected degree

Proof To use matrix Bernsteinrsquos inequality let us break A into a sumof independent random matrices

A =983131

iji983249j

Xij

where each matrix Xij contains a pair of symmetric entries of A orone diagonal entry Matrix Bernsteinrsquos inequality obviously applies forthe sum

Aminus EA =983131

i983249j

(Xij minus EXij)

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 15 18

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 16: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Corollary 3 gives

E983042Aminus EA9830422 ≲ σ983155

log n+K log n

where

σ2 =

983056983056983056983056983056983056

983131

i983249j

E (Xij minus EXij)2

9830569830569830569830569830569830562

andK = max

ij983042Xij minus EXij9830422

It is a good exercise to check that

σ2 ≲ d

andK 983249 2

This completes the proof

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 16 18

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 17: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

How useful is Theorem 4 for community recovery Suppose thatthe network is not too sparse namely

d ≫ log n

Then983042Aminus EA9830422 ≲

983155d log n

while983042EA9830422 = λ1(EA) = d

which implies that

983042Aminus EA9830422 ≪ 983042EA9830422

In other words A nicely approximates EA the relative error orapproximation is small in the operator norm

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 17 18

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18

Page 18: Lecture 13: Concentration of sums of independent random matricesmath.xmu.edu.cn/group/nona/damc/Lecture13.pdf · 2020-05-21 · Lecture 13: Concentration of sums of independent random

Classical results from the perturbation theory for matrices statethat if A and EA are close then their eigenvalues andeigenvectors must also be close The relevant perturbation resultsare Weylrsquos inequality for eigenvalues and Davis-Kahanrsquos inequalityfor eigenvectors (See the bible NLA book MC)

So we can use v2(A) to approximately recover the communitiesThis method is called spectral clustering

Spectral Clustering Algorithm

Compute v2(A) the eigenvector corresponding to the secondlargest eigenvalue of the adjacency matrix A of the network Usethe signs of the coefficients of v2(A) to predict the communitymembership of the vertices

Independent Random Matrices DAMC Lecture 13 May 20 - 22 2020 18 18