least singular value, circular law, and lindeberg...

39
Least singular value, circular law, and Lindeberg exchange Terence Tao Abstract. These lectures cover three loosely related topics in random matrix the- ory. First we discuss the techniques used to bound the least singular value of (non- Hermitian) random matrices, focusing particularly on the matrices with jointly independent entries. We then use these bounds to obtain the circular law for the spectrum of matrices with iid entries of finite variance. Finally, we discuss the Lindeberg exchange method which allows one to demonstrate universality of many spectral statistics of matrices (both Hermitian and non-Hermitian). 1. The least singular value This section 1 of the lecture notes is concerned with the behaviour of the least singular value σ n (M) of an n × n matrix M (or, more generally, the least non- trivial singular value σ p (M) of a n × p matrix with p 6 n). This quantity controls the invertibility of M. Indeed, M is invertible precisely when σ n (M) is non-zero, and the 2 operator norm kM -1 k op of M -1 is given by 1n (M). This quantity is also related to the condition number σ 1 (M)n (M)= kMk op kM -1 k op of M, which is of importance in numerical linear algebra. As we shall see in Section 2, the least singular value of M (and more generally, of the shifts 1 n M - zI for complex z) will be of importance in rigorously establishing the circular law for iid random matrices M. The least singular value 2 σ n (M)= inf kxk=1 kMxk, which sits at the “hard edge” of the spectrum, bears a superficial similarity to the operator norm kMk op = σ 1 (M)= sup kxk=1 kMxk 2010 Mathematics Subject Classification. Primary 60B20; Secondary 60F17. Key words and phrases. Least singular value, circular law, Lindeberg exchange method, random matri- ces. 1 The material in this section and the next is based on that in [56]. Thanks to Nick Cook and the anonymous referees for corrections and suggestions. The author is supported by NSF grant DMS- 1266164, the James and Carol Collins Chair, and by a Simons Investigator Award. 2 In these notes we use kk to denote the 2 norm on R n or C n . ©0000 (copyright holder)

Upload: others

Post on 10-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Least singular value circular law and Lindeberg exchange

Terence Tao

Abstract These lectures cover three loosely related topics in random matrix the-ory First we discuss the techniques used to bound the least singular value of (non-Hermitian) random matrices focusing particularly on the matrices with jointlyindependent entries We then use these bounds to obtain the circular law for thespectrum of matrices with iid entries of finite variance Finally we discuss theLindeberg exchange method which allows one to demonstrate universality of manyspectral statistics of matrices (both Hermitian and non-Hermitian)

1 The least singular value

This section1 of the lecture notes is concerned with the behaviour of the leastsingular value σn(M) of an n times n matrix M (or more generally the least non-trivial singular value σp(M) of a ntimesp matrix with p 6 n) This quantity controlsthe invertibility of M Indeed M is invertible precisely when σn(M) is non-zeroand the `2 operator norm Mminus1op of Mminus1 is given by 1σn(M) This quantity isalso related to the condition number σ1(M)σn(M) = MopMminus1op of M whichis of importance in numerical linear algebra As we shall see in Section 2 the leastsingular value of M (and more generally of the shifts 1radic

nMminus zI for complex z)

will be of importance in rigorously establishing the circular law for iid randommatrices M

The least singular value2

σn(M) = infx=1

Mx

which sits at the ldquohard edgerdquo of the spectrum bears a superficial similarity to theoperator norm

Mop = σ1(M) = supx=1

Mx

2010 Mathematics Subject Classification Primary 60B20 Secondary 60F17Key words and phrases Least singular value circular law Lindeberg exchange method random matri-ces1The material in this section and the next is based on that in [56] Thanks to Nick Cook and theanonymous referees for corrections and suggestions The author is supported by NSF grant DMS-1266164 the James and Carol Collins Chair and by a Simons Investigator Award2In these notes we use to denote the `2 norm on Rn or Cn

copy0000 (copyright holder)

2 Least singular value circular law and Lindeberg exchange

at the ldquosoft edgerdquo of the spectrum For strongly rectangular matrices the tech-niques that are useful to control the latter can also control the former but the sit-uation becomes more delicate for square matrices For instance the ldquoepsilon netrdquomethod that is so useful for understanding the operator norm can control someldquolow entropyrdquo portions of the infimum that arise from ldquostructuredrdquo or ldquocompress-iblerdquo choices of x but are not able to control the ldquogenericrdquo or ldquoincompressiblerdquochoices of x for which new arguments will be needed Similarly the momentmethod can give the coarse order of magnitude (for instance for rectangular ma-trices with p = yn for 0 lt y lt 1 it gives an upper bound of (1 minus

radicy+ o(1))n for

the least singular value with high probability thanks to the Marcenko-Pastur law)but again this method begins to break down for square matrices although onecan make some partial headway by considering negative moments such as trMminus2though these are more difficult to compute than positive moments trMk

So one needs to supplement these existing methods with additional tools Itturns out that the key issue is to understand the distance between one of the nrows X1 Xn isin Cn of the matrix M and the hyperplane spanned by the othernminus1 rows The reason for this is as follows First suppose that σn(M) = 0 so thatM is non-invertible and there is a linear dependence between the rows X1 XnThus one of the Xi will lie in the hyperplane spanned by the other rows and soone of the distances mentioned above will vanish in fact one expects many ofthe n distances to vanish Conversely whenever one of these distances vanishesone has a linear dependence and so σn(M) = 0

More generally if the least singular value σn(M) is small one genericallyexpects many of these n distances to be small also and conversely Thus controlof the least singular value is morally equivalent to control of the distance betweena row Xi and the hyperplane spanned by the other rows This latter quantity isbasically the dot product of Xi with a unit normal ni of this hyperplane

When working with random matrices with jointly independent coefficients wehave the crucial property that the unit normal ni (which depends on all the rowsother than Xi) is independent of Xi so even after conditioning ni to be fixed theentries of Xi remain independent As such the dot product Xi middot ni is a familiarscalar random walk and can be controlled by a number of tools most notablyLittlewood-Offord theorems and the Berry-Esseacuteen central limit theorem As itturns out this type of control works well except in some rare cases in whichthe normal ni is ldquocompressiblerdquo or otherwise highly structured but epsilon-netarguments can be used to dispose of these cases3

These methods rely quite strongly on the joint independence on all the entriesit remains a challenge to extend them to more general settings Even for Wignermatrices the methods run into difficulty because of the non-independence of

3This general strategy was first developed for the technically simpler singularity problem in [43] andthen extended to the least singular value problem in [52]

Terence Tao 3

some of the entries (although it turns out one can understand the least singularvalue in such cases by rather different methods)

To simplify the exposition we shall focus primarily here on just one specificensemble of random matrices the Bernoulli ensemble M = (ξij)16i6p16j6n ofrandom sign matrices where ξij = plusmn1 are independent Bernoulli signs Howeverthe results can extend to more general classes of random matrices with the mainrequirement being that the coefficients are jointly independent

Throughout these notes we use X Y Y X or X = O(Y) to denote a boundof the form |X| 6 CY for some absolute constant C we take n as an asymptoticparameter and write X = o(Y) to denote a bound of the form |X| 6 c(n)Y for somequantity c(n) that goes to zero as n goes to infinity (holding other parametersfixed) If C or c(n) needs to depend on additional parameters we will denotethis by subscripts eg X δ Y denotes a bound of the form X gt cδ|Y| for somecδ gt 0

11 The epsilon-net argument We begin by using the epsilon net argument toupper bound the operator norm

Theorem 111 (Upper bound for operator norm) Let M = (ξij)16ij6n16j6n bean ntimes n Bernoulli matrix Then with exponentially high probability (ie 1 minusO(eminuscn)

for some c gt 0) one has

(112) Mop = σ1(M) 6 Cradicn

for some absolute constant C

We remark that the above theorem in fact holds for any C gt 2 this follows forinstance from the work of Geman [32] combined with the Talagrand concentrationinequality (see Exercise 145 below) We will not use this improvement to thistheorem here

Proof We writeMop = sup

xisinRnx=1Mx

thus the failure event Mop gt Cradicn is the union of the events Mx gt C

radicn as

x ranges over the unit sphere One cannot apply the union bound immediatelybecause the number of points on the unit sphere is uncountable Instead we firstdiscretise the unit sphere using the ldquoepsilon net argumentrdquo Let Σ be a maximal12-net of the unit sphere in Rn that is to say a maximal 12-separated subsetof the sphere Observe that if x attains the supremum in the above equation andy is the nearest element of Σ to x then yminus x 6 12 and hence by the triangleinequality

Mop = Mx 6 My+ Mopyminus x

and thusMop 6 sup

xisinΣMx+ 1

2Mop

4 Least singular value circular law and Lindeberg exchange

or equivalentlyMop 6 2 sup

xisinΣMx

and so it suffices to show that

P(supxisinΣMx gt C

2radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx gt C2radicn)

The balls of radius 14 around each point in Σ are disjoint and lie in the 14-neighbourhood of the sphere From volume considerations we conclude that

(113) |Σ| 6 O(1)n

We set aside this bound as an ldquoentropy costrdquo to be paid later and focus on upperbounding for each x isin Σ the probability

P(Mx gt C2radicn)

If we let Y1 Yn isin Rn be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 gtC2

4n

To bound this expression we use the same exponential moment method usedto prove the Chernoff inequality Indeed for any parameter c gt 0 we can useMarkovrsquos inequality to bound the above by

eminuscC2n4E exp(c

nsumj=1

|Yj middot x|2)

As the random vectors Y1 Yn are iid this is equal to

eminuscC2n4

(E exp(c|Y middot x|2)

)n

Using the Chernoff inequality the vectors Y middot x are uniformly subgaussian in thesense that there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular one has

E exp(c|Y middot x|2) 1

if c gt 0 is a sufficiently small absolute constant This implies the bound

P(Mx gt C2radicn) exp(minuscC2n8 +O(n))

Taking C large enough we obtain the claim

Now we use the same method to establish a lower bound in the rectangularcase first established in [4]

Terence Tao 5

Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ

radicn

To prove this theorem we again use the ldquoepsilon net argumentrdquo We write

σp(M) = infxisinRpx=1

Mx

Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have

σp(M) gt infxisinΣMxminus εMop

and thus by (112) we have with exponentially high probability that

σp(M) gt infxisinΣMxminusCε

radicn

and so it suffices to show that

P( infxisinΣMx 6 2Cε

radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx 6 2Cεradicn)

The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that

(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n

We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability

P(Mx 6 2Cεradicn)

If we let Y1 Yn isin Rp be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 6 4C2ε2n

By Markovrsquos inequality the only way that this event can hold is if we have

|Yj middot x|2 6 8C2ε2δ

for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by

6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)

Now observe that the random variables Yj middot x are independent and so we canbound this expression by

6 2nP(|Y middot x| 6radic

8Cεδ12)(1minusδ2)n

where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs

6 Least singular value circular law and Lindeberg exchange

We write x = (x1 xn) so that Y middot x is a random walk

Y middot x = ξ1x1 + middot middot middot+ ξnxn

To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem

Exercise 116 Show4 that

supt

P(|Y middot xminus t| 6 r) r

x+

1x3

nsumj=1

|xj|3

for any r gt 0 and any non-zero xConclude in particular that if sum

j|xj|6ε

|xj|2 gt η

for some η gt 0 thensupt

P(|Y middot xminus t| 6radic

8Cε)η ε

(Hint condition out all the xj with |xj| gt ε)

Let us temporarily call x incompressible ifsumj|xj|6ε

|xj|2 gt η

and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound

P(Mx 6 2Cεradicn) Oη(ε)

(1minusδ2)n

and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)

It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))

in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound

Exercise 117 For any unit vector x show that

P(|Y middot x| 6 κ) 6 1 minus κ

for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2

E(Z2) valid for any non-negative random variable

Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments

4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with

sumnj=1 |xj|

3x3 replaced by (sum3j=1 |xj|

3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3

Terence Tao 7

on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that

P(Mx 6 αradicn) exp(minuscn)

for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n

Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114

12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability

P(σn(M) = 0)

ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound

P(σn(M) = 0) gt1

2n

and it is conjectured that this bound is essentially tight in the sense that

P(σn(M) = 0) =(

12+ o(1)

)n

but this remains open the best bound currently is [18] and gives

P(σn(M) = 0) 6(

1radic2+ o(1)

)n

We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]

Proposition 121 We have P(σn(M) = 0) 1n12

To show this we need the following combinatorial fact due to Erdoumls [25]

Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12

Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most

(nbn2c

) in minus1 1n with the product partial ordering The

claim now easily follows from this theorem and Stirlingrsquos formula

5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order

8 Least singular value circular law and Lindeberg exchange

Note that we also have the obvious bound

(123) P(Y middot x = 0) 6 12

for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11

we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)

(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns

Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise

Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc

(nk

) we may assume that it is the first k entries that are non-zero for some

1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk

) we may assume that it is the top left minor which does this In particular we

can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1

radick))(nminusk) giving a total cost ofsum

16k6εn

bεnc(n

k

)2min(12O(1

radick))minus(nminusk)

which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)

The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E

Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

2 Least singular value circular law and Lindeberg exchange

at the ldquosoft edgerdquo of the spectrum For strongly rectangular matrices the tech-niques that are useful to control the latter can also control the former but the sit-uation becomes more delicate for square matrices For instance the ldquoepsilon netrdquomethod that is so useful for understanding the operator norm can control someldquolow entropyrdquo portions of the infimum that arise from ldquostructuredrdquo or ldquocompress-iblerdquo choices of x but are not able to control the ldquogenericrdquo or ldquoincompressiblerdquochoices of x for which new arguments will be needed Similarly the momentmethod can give the coarse order of magnitude (for instance for rectangular ma-trices with p = yn for 0 lt y lt 1 it gives an upper bound of (1 minus

radicy+ o(1))n for

the least singular value with high probability thanks to the Marcenko-Pastur law)but again this method begins to break down for square matrices although onecan make some partial headway by considering negative moments such as trMminus2though these are more difficult to compute than positive moments trMk

So one needs to supplement these existing methods with additional tools Itturns out that the key issue is to understand the distance between one of the nrows X1 Xn isin Cn of the matrix M and the hyperplane spanned by the othernminus1 rows The reason for this is as follows First suppose that σn(M) = 0 so thatM is non-invertible and there is a linear dependence between the rows X1 XnThus one of the Xi will lie in the hyperplane spanned by the other rows and soone of the distances mentioned above will vanish in fact one expects many ofthe n distances to vanish Conversely whenever one of these distances vanishesone has a linear dependence and so σn(M) = 0

More generally if the least singular value σn(M) is small one genericallyexpects many of these n distances to be small also and conversely Thus controlof the least singular value is morally equivalent to control of the distance betweena row Xi and the hyperplane spanned by the other rows This latter quantity isbasically the dot product of Xi with a unit normal ni of this hyperplane

When working with random matrices with jointly independent coefficients wehave the crucial property that the unit normal ni (which depends on all the rowsother than Xi) is independent of Xi so even after conditioning ni to be fixed theentries of Xi remain independent As such the dot product Xi middot ni is a familiarscalar random walk and can be controlled by a number of tools most notablyLittlewood-Offord theorems and the Berry-Esseacuteen central limit theorem As itturns out this type of control works well except in some rare cases in whichthe normal ni is ldquocompressiblerdquo or otherwise highly structured but epsilon-netarguments can be used to dispose of these cases3

These methods rely quite strongly on the joint independence on all the entriesit remains a challenge to extend them to more general settings Even for Wignermatrices the methods run into difficulty because of the non-independence of

3This general strategy was first developed for the technically simpler singularity problem in [43] andthen extended to the least singular value problem in [52]

Terence Tao 3

some of the entries (although it turns out one can understand the least singularvalue in such cases by rather different methods)

To simplify the exposition we shall focus primarily here on just one specificensemble of random matrices the Bernoulli ensemble M = (ξij)16i6p16j6n ofrandom sign matrices where ξij = plusmn1 are independent Bernoulli signs Howeverthe results can extend to more general classes of random matrices with the mainrequirement being that the coefficients are jointly independent

Throughout these notes we use X Y Y X or X = O(Y) to denote a boundof the form |X| 6 CY for some absolute constant C we take n as an asymptoticparameter and write X = o(Y) to denote a bound of the form |X| 6 c(n)Y for somequantity c(n) that goes to zero as n goes to infinity (holding other parametersfixed) If C or c(n) needs to depend on additional parameters we will denotethis by subscripts eg X δ Y denotes a bound of the form X gt cδ|Y| for somecδ gt 0

11 The epsilon-net argument We begin by using the epsilon net argument toupper bound the operator norm

Theorem 111 (Upper bound for operator norm) Let M = (ξij)16ij6n16j6n bean ntimes n Bernoulli matrix Then with exponentially high probability (ie 1 minusO(eminuscn)

for some c gt 0) one has

(112) Mop = σ1(M) 6 Cradicn

for some absolute constant C

We remark that the above theorem in fact holds for any C gt 2 this follows forinstance from the work of Geman [32] combined with the Talagrand concentrationinequality (see Exercise 145 below) We will not use this improvement to thistheorem here

Proof We writeMop = sup

xisinRnx=1Mx

thus the failure event Mop gt Cradicn is the union of the events Mx gt C

radicn as

x ranges over the unit sphere One cannot apply the union bound immediatelybecause the number of points on the unit sphere is uncountable Instead we firstdiscretise the unit sphere using the ldquoepsilon net argumentrdquo Let Σ be a maximal12-net of the unit sphere in Rn that is to say a maximal 12-separated subsetof the sphere Observe that if x attains the supremum in the above equation andy is the nearest element of Σ to x then yminus x 6 12 and hence by the triangleinequality

Mop = Mx 6 My+ Mopyminus x

and thusMop 6 sup

xisinΣMx+ 1

2Mop

4 Least singular value circular law and Lindeberg exchange

or equivalentlyMop 6 2 sup

xisinΣMx

and so it suffices to show that

P(supxisinΣMx gt C

2radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx gt C2radicn)

The balls of radius 14 around each point in Σ are disjoint and lie in the 14-neighbourhood of the sphere From volume considerations we conclude that

(113) |Σ| 6 O(1)n

We set aside this bound as an ldquoentropy costrdquo to be paid later and focus on upperbounding for each x isin Σ the probability

P(Mx gt C2radicn)

If we let Y1 Yn isin Rn be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 gtC2

4n

To bound this expression we use the same exponential moment method usedto prove the Chernoff inequality Indeed for any parameter c gt 0 we can useMarkovrsquos inequality to bound the above by

eminuscC2n4E exp(c

nsumj=1

|Yj middot x|2)

As the random vectors Y1 Yn are iid this is equal to

eminuscC2n4

(E exp(c|Y middot x|2)

)n

Using the Chernoff inequality the vectors Y middot x are uniformly subgaussian in thesense that there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular one has

E exp(c|Y middot x|2) 1

if c gt 0 is a sufficiently small absolute constant This implies the bound

P(Mx gt C2radicn) exp(minuscC2n8 +O(n))

Taking C large enough we obtain the claim

Now we use the same method to establish a lower bound in the rectangularcase first established in [4]

Terence Tao 5

Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ

radicn

To prove this theorem we again use the ldquoepsilon net argumentrdquo We write

σp(M) = infxisinRpx=1

Mx

Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have

σp(M) gt infxisinΣMxminus εMop

and thus by (112) we have with exponentially high probability that

σp(M) gt infxisinΣMxminusCε

radicn

and so it suffices to show that

P( infxisinΣMx 6 2Cε

radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx 6 2Cεradicn)

The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that

(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n

We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability

P(Mx 6 2Cεradicn)

If we let Y1 Yn isin Rp be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 6 4C2ε2n

By Markovrsquos inequality the only way that this event can hold is if we have

|Yj middot x|2 6 8C2ε2δ

for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by

6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)

Now observe that the random variables Yj middot x are independent and so we canbound this expression by

6 2nP(|Y middot x| 6radic

8Cεδ12)(1minusδ2)n

where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs

6 Least singular value circular law and Lindeberg exchange

We write x = (x1 xn) so that Y middot x is a random walk

Y middot x = ξ1x1 + middot middot middot+ ξnxn

To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem

Exercise 116 Show4 that

supt

P(|Y middot xminus t| 6 r) r

x+

1x3

nsumj=1

|xj|3

for any r gt 0 and any non-zero xConclude in particular that if sum

j|xj|6ε

|xj|2 gt η

for some η gt 0 thensupt

P(|Y middot xminus t| 6radic

8Cε)η ε

(Hint condition out all the xj with |xj| gt ε)

Let us temporarily call x incompressible ifsumj|xj|6ε

|xj|2 gt η

and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound

P(Mx 6 2Cεradicn) Oη(ε)

(1minusδ2)n

and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)

It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))

in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound

Exercise 117 For any unit vector x show that

P(|Y middot x| 6 κ) 6 1 minus κ

for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2

E(Z2) valid for any non-negative random variable

Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments

4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with

sumnj=1 |xj|

3x3 replaced by (sum3j=1 |xj|

3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3

Terence Tao 7

on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that

P(Mx 6 αradicn) exp(minuscn)

for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n

Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114

12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability

P(σn(M) = 0)

ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound

P(σn(M) = 0) gt1

2n

and it is conjectured that this bound is essentially tight in the sense that

P(σn(M) = 0) =(

12+ o(1)

)n

but this remains open the best bound currently is [18] and gives

P(σn(M) = 0) 6(

1radic2+ o(1)

)n

We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]

Proposition 121 We have P(σn(M) = 0) 1n12

To show this we need the following combinatorial fact due to Erdoumls [25]

Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12

Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most

(nbn2c

) in minus1 1n with the product partial ordering The

claim now easily follows from this theorem and Stirlingrsquos formula

5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order

8 Least singular value circular law and Lindeberg exchange

Note that we also have the obvious bound

(123) P(Y middot x = 0) 6 12

for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11

we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)

(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns

Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise

Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc

(nk

) we may assume that it is the first k entries that are non-zero for some

1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk

) we may assume that it is the top left minor which does this In particular we

can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1

radick))(nminusk) giving a total cost ofsum

16k6εn

bεnc(n

k

)2min(12O(1

radick))minus(nminusk)

which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)

The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E

Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 3

some of the entries (although it turns out one can understand the least singularvalue in such cases by rather different methods)

To simplify the exposition we shall focus primarily here on just one specificensemble of random matrices the Bernoulli ensemble M = (ξij)16i6p16j6n ofrandom sign matrices where ξij = plusmn1 are independent Bernoulli signs Howeverthe results can extend to more general classes of random matrices with the mainrequirement being that the coefficients are jointly independent

Throughout these notes we use X Y Y X or X = O(Y) to denote a boundof the form |X| 6 CY for some absolute constant C we take n as an asymptoticparameter and write X = o(Y) to denote a bound of the form |X| 6 c(n)Y for somequantity c(n) that goes to zero as n goes to infinity (holding other parametersfixed) If C or c(n) needs to depend on additional parameters we will denotethis by subscripts eg X δ Y denotes a bound of the form X gt cδ|Y| for somecδ gt 0

11 The epsilon-net argument We begin by using the epsilon net argument toupper bound the operator norm

Theorem 111 (Upper bound for operator norm) Let M = (ξij)16ij6n16j6n bean ntimes n Bernoulli matrix Then with exponentially high probability (ie 1 minusO(eminuscn)

for some c gt 0) one has

(112) Mop = σ1(M) 6 Cradicn

for some absolute constant C

We remark that the above theorem in fact holds for any C gt 2 this follows forinstance from the work of Geman [32] combined with the Talagrand concentrationinequality (see Exercise 145 below) We will not use this improvement to thistheorem here

Proof We writeMop = sup

xisinRnx=1Mx

thus the failure event Mop gt Cradicn is the union of the events Mx gt C

radicn as

x ranges over the unit sphere One cannot apply the union bound immediatelybecause the number of points on the unit sphere is uncountable Instead we firstdiscretise the unit sphere using the ldquoepsilon net argumentrdquo Let Σ be a maximal12-net of the unit sphere in Rn that is to say a maximal 12-separated subsetof the sphere Observe that if x attains the supremum in the above equation andy is the nearest element of Σ to x then yminus x 6 12 and hence by the triangleinequality

Mop = Mx 6 My+ Mopyminus x

and thusMop 6 sup

xisinΣMx+ 1

2Mop

4 Least singular value circular law and Lindeberg exchange

or equivalentlyMop 6 2 sup

xisinΣMx

and so it suffices to show that

P(supxisinΣMx gt C

2radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx gt C2radicn)

The balls of radius 14 around each point in Σ are disjoint and lie in the 14-neighbourhood of the sphere From volume considerations we conclude that

(113) |Σ| 6 O(1)n

We set aside this bound as an ldquoentropy costrdquo to be paid later and focus on upperbounding for each x isin Σ the probability

P(Mx gt C2radicn)

If we let Y1 Yn isin Rn be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 gtC2

4n

To bound this expression we use the same exponential moment method usedto prove the Chernoff inequality Indeed for any parameter c gt 0 we can useMarkovrsquos inequality to bound the above by

eminuscC2n4E exp(c

nsumj=1

|Yj middot x|2)

As the random vectors Y1 Yn are iid this is equal to

eminuscC2n4

(E exp(c|Y middot x|2)

)n

Using the Chernoff inequality the vectors Y middot x are uniformly subgaussian in thesense that there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular one has

E exp(c|Y middot x|2) 1

if c gt 0 is a sufficiently small absolute constant This implies the bound

P(Mx gt C2radicn) exp(minuscC2n8 +O(n))

Taking C large enough we obtain the claim

Now we use the same method to establish a lower bound in the rectangularcase first established in [4]

Terence Tao 5

Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ

radicn

To prove this theorem we again use the ldquoepsilon net argumentrdquo We write

σp(M) = infxisinRpx=1

Mx

Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have

σp(M) gt infxisinΣMxminus εMop

and thus by (112) we have with exponentially high probability that

σp(M) gt infxisinΣMxminusCε

radicn

and so it suffices to show that

P( infxisinΣMx 6 2Cε

radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx 6 2Cεradicn)

The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that

(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n

We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability

P(Mx 6 2Cεradicn)

If we let Y1 Yn isin Rp be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 6 4C2ε2n

By Markovrsquos inequality the only way that this event can hold is if we have

|Yj middot x|2 6 8C2ε2δ

for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by

6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)

Now observe that the random variables Yj middot x are independent and so we canbound this expression by

6 2nP(|Y middot x| 6radic

8Cεδ12)(1minusδ2)n

where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs

6 Least singular value circular law and Lindeberg exchange

We write x = (x1 xn) so that Y middot x is a random walk

Y middot x = ξ1x1 + middot middot middot+ ξnxn

To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem

Exercise 116 Show4 that

supt

P(|Y middot xminus t| 6 r) r

x+

1x3

nsumj=1

|xj|3

for any r gt 0 and any non-zero xConclude in particular that if sum

j|xj|6ε

|xj|2 gt η

for some η gt 0 thensupt

P(|Y middot xminus t| 6radic

8Cε)η ε

(Hint condition out all the xj with |xj| gt ε)

Let us temporarily call x incompressible ifsumj|xj|6ε

|xj|2 gt η

and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound

P(Mx 6 2Cεradicn) Oη(ε)

(1minusδ2)n

and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)

It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))

in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound

Exercise 117 For any unit vector x show that

P(|Y middot x| 6 κ) 6 1 minus κ

for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2

E(Z2) valid for any non-negative random variable

Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments

4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with

sumnj=1 |xj|

3x3 replaced by (sum3j=1 |xj|

3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3

Terence Tao 7

on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that

P(Mx 6 αradicn) exp(minuscn)

for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n

Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114

12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability

P(σn(M) = 0)

ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound

P(σn(M) = 0) gt1

2n

and it is conjectured that this bound is essentially tight in the sense that

P(σn(M) = 0) =(

12+ o(1)

)n

but this remains open the best bound currently is [18] and gives

P(σn(M) = 0) 6(

1radic2+ o(1)

)n

We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]

Proposition 121 We have P(σn(M) = 0) 1n12

To show this we need the following combinatorial fact due to Erdoumls [25]

Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12

Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most

(nbn2c

) in minus1 1n with the product partial ordering The

claim now easily follows from this theorem and Stirlingrsquos formula

5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order

8 Least singular value circular law and Lindeberg exchange

Note that we also have the obvious bound

(123) P(Y middot x = 0) 6 12

for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11

we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)

(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns

Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise

Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc

(nk

) we may assume that it is the first k entries that are non-zero for some

1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk

) we may assume that it is the top left minor which does this In particular we

can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1

radick))(nminusk) giving a total cost ofsum

16k6εn

bεnc(n

k

)2min(12O(1

radick))minus(nminusk)

which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)

The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E

Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

4 Least singular value circular law and Lindeberg exchange

or equivalentlyMop 6 2 sup

xisinΣMx

and so it suffices to show that

P(supxisinΣMx gt C

2radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx gt C2radicn)

The balls of radius 14 around each point in Σ are disjoint and lie in the 14-neighbourhood of the sphere From volume considerations we conclude that

(113) |Σ| 6 O(1)n

We set aside this bound as an ldquoentropy costrdquo to be paid later and focus on upperbounding for each x isin Σ the probability

P(Mx gt C2radicn)

If we let Y1 Yn isin Rn be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 gtC2

4n

To bound this expression we use the same exponential moment method usedto prove the Chernoff inequality Indeed for any parameter c gt 0 we can useMarkovrsquos inequality to bound the above by

eminuscC2n4E exp(c

nsumj=1

|Yj middot x|2)

As the random vectors Y1 Yn are iid this is equal to

eminuscC2n4

(E exp(c|Y middot x|2)

)n

Using the Chernoff inequality the vectors Y middot x are uniformly subgaussian in thesense that there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular one has

E exp(c|Y middot x|2) 1

if c gt 0 is a sufficiently small absolute constant This implies the bound

P(Mx gt C2radicn) exp(minuscC2n8 +O(n))

Taking C large enough we obtain the claim

Now we use the same method to establish a lower bound in the rectangularcase first established in [4]

Terence Tao 5

Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ

radicn

To prove this theorem we again use the ldquoepsilon net argumentrdquo We write

σp(M) = infxisinRpx=1

Mx

Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have

σp(M) gt infxisinΣMxminus εMop

and thus by (112) we have with exponentially high probability that

σp(M) gt infxisinΣMxminusCε

radicn

and so it suffices to show that

P( infxisinΣMx 6 2Cε

radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx 6 2Cεradicn)

The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that

(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n

We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability

P(Mx 6 2Cεradicn)

If we let Y1 Yn isin Rp be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 6 4C2ε2n

By Markovrsquos inequality the only way that this event can hold is if we have

|Yj middot x|2 6 8C2ε2δ

for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by

6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)

Now observe that the random variables Yj middot x are independent and so we canbound this expression by

6 2nP(|Y middot x| 6radic

8Cεδ12)(1minusδ2)n

where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs

6 Least singular value circular law and Lindeberg exchange

We write x = (x1 xn) so that Y middot x is a random walk

Y middot x = ξ1x1 + middot middot middot+ ξnxn

To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem

Exercise 116 Show4 that

supt

P(|Y middot xminus t| 6 r) r

x+

1x3

nsumj=1

|xj|3

for any r gt 0 and any non-zero xConclude in particular that if sum

j|xj|6ε

|xj|2 gt η

for some η gt 0 thensupt

P(|Y middot xminus t| 6radic

8Cε)η ε

(Hint condition out all the xj with |xj| gt ε)

Let us temporarily call x incompressible ifsumj|xj|6ε

|xj|2 gt η

and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound

P(Mx 6 2Cεradicn) Oη(ε)

(1minusδ2)n

and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)

It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))

in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound

Exercise 117 For any unit vector x show that

P(|Y middot x| 6 κ) 6 1 minus κ

for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2

E(Z2) valid for any non-negative random variable

Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments

4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with

sumnj=1 |xj|

3x3 replaced by (sum3j=1 |xj|

3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3

Terence Tao 7

on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that

P(Mx 6 αradicn) exp(minuscn)

for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n

Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114

12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability

P(σn(M) = 0)

ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound

P(σn(M) = 0) gt1

2n

and it is conjectured that this bound is essentially tight in the sense that

P(σn(M) = 0) =(

12+ o(1)

)n

but this remains open the best bound currently is [18] and gives

P(σn(M) = 0) 6(

1radic2+ o(1)

)n

We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]

Proposition 121 We have P(σn(M) = 0) 1n12

To show this we need the following combinatorial fact due to Erdoumls [25]

Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12

Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most

(nbn2c

) in minus1 1n with the product partial ordering The

claim now easily follows from this theorem and Stirlingrsquos formula

5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order

8 Least singular value circular law and Lindeberg exchange

Note that we also have the obvious bound

(123) P(Y middot x = 0) 6 12

for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11

we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)

(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns

Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise

Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc

(nk

) we may assume that it is the first k entries that are non-zero for some

1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk

) we may assume that it is the top left minor which does this In particular we

can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1

radick))(nminusk) giving a total cost ofsum

16k6εn

bεnc(n

k

)2min(12O(1

radick))minus(nminusk)

which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)

The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E

Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 5

Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ

radicn

To prove this theorem we again use the ldquoepsilon net argumentrdquo We write

σp(M) = infxisinRpx=1

Mx

Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have

σp(M) gt infxisinΣMxminus εMop

and thus by (112) we have with exponentially high probability that

σp(M) gt infxisinΣMxminusCε

radicn

and so it suffices to show that

P( infxisinΣMx 6 2Cε

radicn)

is exponentially small in n From the union bound we can upper bound this bysumxisinΣ

P(Mx 6 2Cεradicn)

The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that

(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n

We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability

P(Mx 6 2Cεradicn)

If we let Y1 Yn isin Rp be the rows of M we can write this as

P

nsumj=1

|Yj middot x|2 6 4C2ε2n

By Markovrsquos inequality the only way that this event can hold is if we have

|Yj middot x|2 6 8C2ε2δ

for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by

6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)

Now observe that the random variables Yj middot x are independent and so we canbound this expression by

6 2nP(|Y middot x| 6radic

8Cεδ12)(1minusδ2)n

where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs

6 Least singular value circular law and Lindeberg exchange

We write x = (x1 xn) so that Y middot x is a random walk

Y middot x = ξ1x1 + middot middot middot+ ξnxn

To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem

Exercise 116 Show4 that

supt

P(|Y middot xminus t| 6 r) r

x+

1x3

nsumj=1

|xj|3

for any r gt 0 and any non-zero xConclude in particular that if sum

j|xj|6ε

|xj|2 gt η

for some η gt 0 thensupt

P(|Y middot xminus t| 6radic

8Cε)η ε

(Hint condition out all the xj with |xj| gt ε)

Let us temporarily call x incompressible ifsumj|xj|6ε

|xj|2 gt η

and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound

P(Mx 6 2Cεradicn) Oη(ε)

(1minusδ2)n

and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)

It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))

in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound

Exercise 117 For any unit vector x show that

P(|Y middot x| 6 κ) 6 1 minus κ

for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2

E(Z2) valid for any non-negative random variable

Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments

4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with

sumnj=1 |xj|

3x3 replaced by (sum3j=1 |xj|

3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3

Terence Tao 7

on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that

P(Mx 6 αradicn) exp(minuscn)

for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n

Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114

12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability

P(σn(M) = 0)

ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound

P(σn(M) = 0) gt1

2n

and it is conjectured that this bound is essentially tight in the sense that

P(σn(M) = 0) =(

12+ o(1)

)n

but this remains open the best bound currently is [18] and gives

P(σn(M) = 0) 6(

1radic2+ o(1)

)n

We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]

Proposition 121 We have P(σn(M) = 0) 1n12

To show this we need the following combinatorial fact due to Erdoumls [25]

Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12

Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most

(nbn2c

) in minus1 1n with the product partial ordering The

claim now easily follows from this theorem and Stirlingrsquos formula

5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order

8 Least singular value circular law and Lindeberg exchange

Note that we also have the obvious bound

(123) P(Y middot x = 0) 6 12

for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11

we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)

(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns

Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise

Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc

(nk

) we may assume that it is the first k entries that are non-zero for some

1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk

) we may assume that it is the top left minor which does this In particular we

can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1

radick))(nminusk) giving a total cost ofsum

16k6εn

bεnc(n

k

)2min(12O(1

radick))minus(nminusk)

which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)

The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E

Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

6 Least singular value circular law and Lindeberg exchange

We write x = (x1 xn) so that Y middot x is a random walk

Y middot x = ξ1x1 + middot middot middot+ ξnxn

To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem

Exercise 116 Show4 that

supt

P(|Y middot xminus t| 6 r) r

x+

1x3

nsumj=1

|xj|3

for any r gt 0 and any non-zero xConclude in particular that if sum

j|xj|6ε

|xj|2 gt η

for some η gt 0 thensupt

P(|Y middot xminus t| 6radic

8Cε)η ε

(Hint condition out all the xj with |xj| gt ε)

Let us temporarily call x incompressible ifsumj|xj|6ε

|xj|2 gt η

and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound

P(Mx 6 2Cεradicn) Oη(ε)

(1minusδ2)n

and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)

It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))

in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound

Exercise 117 For any unit vector x show that

P(|Y middot x| 6 κ) 6 1 minus κ

for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2

E(Z2) valid for any non-negative random variable

Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments

4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with

sumnj=1 |xj|

3x3 replaced by (sum3j=1 |xj|

3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3

Terence Tao 7

on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that

P(Mx 6 αradicn) exp(minuscn)

for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n

Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114

12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability

P(σn(M) = 0)

ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound

P(σn(M) = 0) gt1

2n

and it is conjectured that this bound is essentially tight in the sense that

P(σn(M) = 0) =(

12+ o(1)

)n

but this remains open the best bound currently is [18] and gives

P(σn(M) = 0) 6(

1radic2+ o(1)

)n

We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]

Proposition 121 We have P(σn(M) = 0) 1n12

To show this we need the following combinatorial fact due to Erdoumls [25]

Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12

Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most

(nbn2c

) in minus1 1n with the product partial ordering The

claim now easily follows from this theorem and Stirlingrsquos formula

5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order

8 Least singular value circular law and Lindeberg exchange

Note that we also have the obvious bound

(123) P(Y middot x = 0) 6 12

for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11

we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)

(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns

Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise

Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc

(nk

) we may assume that it is the first k entries that are non-zero for some

1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk

) we may assume that it is the top left minor which does this In particular we

can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1

radick))(nminusk) giving a total cost ofsum

16k6εn

bεnc(n

k

)2min(12O(1

radick))minus(nminusk)

which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)

The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E

Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 7

on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that

P(Mx 6 αradicn) exp(minuscn)

for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n

Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114

12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability

P(σn(M) = 0)

ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound

P(σn(M) = 0) gt1

2n

and it is conjectured that this bound is essentially tight in the sense that

P(σn(M) = 0) =(

12+ o(1)

)n

but this remains open the best bound currently is [18] and gives

P(σn(M) = 0) 6(

1radic2+ o(1)

)n

We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]

Proposition 121 We have P(σn(M) = 0) 1n12

To show this we need the following combinatorial fact due to Erdoumls [25]

Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12

Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most

(nbn2c

) in minus1 1n with the product partial ordering The

claim now easily follows from this theorem and Stirlingrsquos formula

5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order

8 Least singular value circular law and Lindeberg exchange

Note that we also have the obvious bound

(123) P(Y middot x = 0) 6 12

for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11

we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)

(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns

Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise

Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc

(nk

) we may assume that it is the first k entries that are non-zero for some

1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk

) we may assume that it is the top left minor which does this In particular we

can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1

radick))(nminusk) giving a total cost ofsum

16k6εn

bεnc(n

k

)2min(12O(1

radick))minus(nminusk)

which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)

The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E

Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

8 Least singular value circular law and Lindeberg exchange

Note that we also have the obvious bound

(123) P(Y middot x = 0) 6 12

for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11

we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)

(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns

Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise

Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc

(nk

) we may assume that it is the first k entries that are non-zero for some

1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk

) we may assume that it is the top left minor which does this In particular we

can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1

radick))(nminusk) giving a total cost ofsum

16k6εn

bεnc(n

k

)2min(12O(1

radick))minus(nminusk)

which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)

The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E

Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 9

then we see from double counting that

P(E) 61εn

nsumk=1

P(Ek)

By symmetry we thus haveP(E) 6

P(En)

To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows

Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier

13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1

radicn on the av-

erage Another argument supporting this heuristic scomes from the followingidentity

Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and

sumni=1 σi(M)minus2 =sumn

i=1 dist(XiVi)minus2

The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that

sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-

tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)

radicn)

Now we give a rigorous lower bound

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

10 Least singular value circular law and Lindeberg exchange

Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas

P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)

where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ

This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result

The scale 1radicn that we are working at here is too fine to use epsilon net ar-

guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as

P(Mx 6 λradicn for some unit vector x isin Cn)

We can assume that Mop 6 Cradicn for some absolute constant C as the event

that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector

x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ

radicn for some compressible x In fact we

can even dispose of the larger event that Mx 6 λradicn for some compressible x

By paying an entropy cost of(nbεnc

) we may assume that x is within ε of a vector

y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that

My 6 (λ+Cε)radicn

Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)

radicn) But by Theorem 114 this occurs

with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most

(nbεnc

)O(exp(minuscn)) which is acceptable if ε

is small enoughThus we may assume now that Mx gt λ

radicn for all compressible unit vectors

x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors

y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M

with the ith column removedThe remaining case is if Mx 6 λ

radicn for some incompressible x Let us call

this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus

x1Y1 + middot middot middot+ xnYn 6 λradicn

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 11

Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that

|xi|dist(YiWi) 6 λradicn

for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε

radicn for at least εn values of i and thus

(133) dist(YiWi) 6 λε

for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that

P(E) 61εn

nsumi=1

P(Ei)

and thus by symmetryP(E) 6

P(En)

(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have

|Yi middot y| 6 λε

Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives

Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2

(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε

This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim

Remark 135 By refining these arguments one can obtain an estimate of the form

(136) σn(1radicnMn minus zI) gt nminusA

for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section

14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

12 Least singular value circular law and Lindeberg exchange

Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has

(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)

We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ

radicn then

(143) ylowastMminus1 6radicnyλ

for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of

Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1

|y middotCi|2 6 ny2λ2

We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have

y2 = dist(XnVnminus1)2

and on the other hand we have for any 1 6 i lt n that

y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)

and so

(144)nminus1sumi=1

|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2

If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the

i so the probability in (142) can be bounded by

1n

nminus1sumi=1

P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))

which by symmetry can be bounded by

P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)

2λ2))

Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality

Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then

P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)

for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have

P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 13

for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)

From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of

P(Xn middot πn(C1) = Oε(1λ)) +O(ε)

Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details

We remark that stronger bounds for the upper tail have recently been obtainedin [48]

15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that

radicnσn(M) converges in distribution to the distribution microE =

1+radicx

2radicxeminusx2minus

radicx dx as n rarr infin It turns out that this result can be extended to

other ensembles with the same mean and variance In particular we have thefollowing result from [60]

Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in

distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)

This should be compared with Theorems 132 141 which show thatradicnσn(M)

have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition

The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of

radicnσn(M) and in particular

that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble

The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas

In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

14 Least singular value circular law and Lindeberg exchange

the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem

Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix

(152) (π(Xia) middot π(Xib))16ab6k

of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik

of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik

is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem

On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1

Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1

In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution

Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-

tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically

universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method

Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show

6These covariance matrix distributions are also known as Wishart distributions

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 15

that

EAlowastAminusn

kBlowastB2

F 6n

k

nsumk=1

Rk4

where F is the Frobenius norm AF = tr(AlowastA)12

Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M

minus1)lowast(Mminus1)) = σn(M)minus2

to differ byO(n2k) from nkσ1(B

lowastB) = nkσ1(B)

2 In principle this gives us asymp-totic universality on

radicnσn(M) from the already established universality of B

There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this

Lemma 155 For any 1 6 k lt i 6 n one has

dist(XiVi) gtπi(Xi)

1 +sumkj=1

πi(Xj)πi(Xi)dist(XjVj)

where πi is the orthogonal projection onto the space spanned by X1 XkXi

Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that

dist(XnVnminus1) gtXn

1 +sumnminus1j=1

XjXndist(XjVj)

Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that

x =

nsumj=1

(x middotXj)Rj

for any vector x applying this to Xn we obtain

Xn = Xn2Rn +

nminus1sumj=1

(Xj middotXn)Rj

and hence by the triangle inequality

Xn2Rn 6 Xn+nminus1sumj=1

XjXnRj

Using the fact that Rj = 1dist(XjRj) the claim follows

In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

16 Least singular value circular law and Lindeberg exchange

(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151

2 The circular law

In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles

The basic result in this area is

Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic

nMn

converges (in the vague topology) both

in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where

xy are the real and imaginary coordinates of the complex plane

This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic

nMn minus zI was not fully justified at the time A rigorous proof of the

circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]

21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 17

fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded

The canonical example of spectral instability comes from perturbing the rightshift matrix

U0 =

0 1 0 0

0 0 1 0

0 0 0 0

to the matrix

Uε =

0 1 0 0

0 0 1 0

ε 0 0 0

for some ε gt 0

The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum

One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)

minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series

(U0 minus zI)minus1 = minus

1zminusU0

z2 minus minusUnminus1

0zn

so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum

Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue

This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op

7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

18 Least singular value circular law and Lindeberg exchange

or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI

Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)

22 Incompleteness of the moment method In the Hermitian case the mo-ments

1n

tr(1radicnM)k =

intR

xk dmicro 1radicnMn

(x)

of a matrix can be used (in principle) to understand the distribution micro 1radicnMn

completely (at least when the measure micro 1radicnMn

has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)

In the non-Hermitian case the spectral measure micro 1radicnMn

is now supported onthe complex plane rather than the real line One still has the formula

1n

tr(1radicnM)k =

intC

zk dmicro 1radicnMn

(z)

but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure

This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie

1n

tr(

1radicnU

)k=

1n

tr(

1radicnUε

)k= 0

for k = 1 nminus 1 Thus we haveintC

zk dmicro 1radicnU

(z) =

intC

zk dmicro 1radicnUε

(z) = 0

for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic

nU

and micro 1radicnUε

are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint

C

zk dmicro = 0

8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 19

for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions

If one could somehow control the mixed momentsintC

zkzl dmicro 1radicnMn

(z) =1n

nsumj=1

(1radicnλj(Mn)

)k( 1radicnλj(Mn)

)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1

n tr( 1radicnMn)

k( 1radicnMlowastn)

l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal

Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations

Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1

n tr( 1radicnMn)

k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that

intC zk dmicro 1radic

nMn

(z)

converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above

Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1

n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit

23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform

sn(z) =1n

tr(

1radicnMn minus zI

)minus1=

intC

dmicro 1radicnMn

(w)

wminus z

This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows

Exercise 231 Show thatmicro 1radic

nMn

=1πpartzsn(z)

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

20 Least singular value circular law and Lindeberg exchange

in the sense of distributions where

partz =12(part

partx+ i

part

party)

is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense

One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic

nMn is 1+o(1) almost surely this combined with (222) and Laurent

expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic

nMn

converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure

For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential

fn(z) =

intC

log |wminus z|dmicro 1radicnMn

(w)

It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)

sn(z) =

(minuspart

partx+ i

part

party

)fn(z)

and thus gn is related to the spectral measure by the distributional formula9

(232) micro 1radicnMn

=1

2π∆fn

where ∆ = part2

partx2 + part2

party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-

tions of convergence

Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to

f(z) =

intC

log |zminusw|dmicro(w)

for some probability measure micro Then micro 1radicnMn

converges almost surely (resp in probabil-ity) to micro in the vague topology

Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise

9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 21

On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)

uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim

Exercise 234 Establish the convergence in probability version of Theorem 233

Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)

Observe that the logarithmic potential

fn(z) =1n

nsumj=1

log∣∣∣∣λj(Mn)radic

nminus z

∣∣∣∣can be rewritten as a log-determinant

fn(z) =1n

log∣∣∣∣det(

1radicnMn minus zI)

∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values

|detA| =nprodj=1

σj(A) =

nprodj=1

λj(AlowastA)12

and thusfn(z) =

12

intinfin0

log x dνnz(x)

where dνnz is the spectral measure of the matrix(

1radicnMn minus zI

)lowast (1radicnMn minus zI

)

The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic

nMn

is that the matrix ( 1radicnMn minus zI)lowast( 1radic

nMn minus zI) is

self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to

12

intinfin0

log x dνz(x)

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

22 Least singular value circular law and Lindeberg exchange

A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem

Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of

intinfin0 F(x) dνnz to

intinfin0 F(x) dνz for F continuous and compactly

supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin

0log x dνnz(x)rarr

intinfin0

log x dνz(x)

can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity

The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin

The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound

σn

(1radicnMn minus zI

)gt nminusC

asymptotically almost surely for the least singular value of 1radicnMn minus zI This has

the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)

Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth

moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic

nMn minus zI are going to be as small as nminusc most will be much larger

than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula

|det(A)| =nprodj=1

dist(XjVj)

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 23

where Xj is the jth row and Vj is the span of X1 Xjminus1

Exercise 235 Establish the inequalitynprod

j=n+1minusm

σj(A) 6mprodj=1

dist(XjVj) 6mprodj=1

σj(A)

for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)

The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic

nMn minus zI) tell us that dist(XjVj) gt nminusC with high

probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about

radicnminus j giving rise

to a net contribution to fn(z) of the form 1n

sum(1minusδ)nltjltnminusn099 O(log

radicnminus j)

which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also

Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the

logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1

(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic

nMn minus zI) has mean (minusz)n and variancesumnminus1

j=0n|z|2j

nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)

3 The Lindeberg exchange method

The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means

X1 + middot middot middot+Xnradicn

converge in distribution to the normal distribution N(0 1) as nrarrinfin thus

limnrarrinfin P(

X1 + middot middot middot+Xnradicn

6 t) = P(G 6 t)

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

24 Least singular value circular law and Lindeberg exchange

for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has

(301) EF

(X1 + middot middot middot+Xnradic

n

)= EF(G) + o(1)

as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin

Exercise 302 Why are these two claims equivalent

One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn

)

are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property

(303) EF

(X1 + middot middot middot+Xnradic

n

)= EF

(Y1 + middot middot middot+ Ynradic

n

)+ o(1)

as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods

and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims

(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then

EF

(Y1 + middot middot middot+ Ynradic

n

)= EF(G) + o(1)

(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|

3 ltinfin then

(304) EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)= o(1)

(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition

Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of

independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn

is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)

The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 25

the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition

X = X primeε +Xprimeprimeε

where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X

primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε

are iid copies of X primeprimeε We then have

X1 + middot middot middot+Xnradicn

=X prime1ε + middot middot middot+X

primenεradic

n+X primeprime1ε + middot middot middot+X

primeprimenεradic

n

If the central limit theorem holds in the case of finite third moment then therandom variable

X prime1ε+middotmiddotmiddot+Xprimenεradic

nconverges in distribution to G Meanwhile the

random variableX primeprime1ε+middotmiddotmiddot+X

primeprimenεradic

nhas mean zero and variance at most ε From this

we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim

The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference

EF

(X1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Ynradic

n

)as a telescoping series formed by the sum of the n terms

EF

(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic

n

)minus EF

(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic

n

)(305)

for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)

We can write (305) as

EF

(Zi +

1radicnXi

)minus EF

(Zi +

1radicnYi

)where Zi is the random variable

Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic

n

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

26 Least singular value circular law and Lindeberg exchange

The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion

F(Zi +1radicnXi) = F(Zi) +

1radicnXiFprime(Zi) +

12nX2iFprimeprime(Zi) +O

(1n32 |Xi|

3)

and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)

EF(Zi +1radicnXi) = EF(Zi) +

1radicn

EXiEFprime(Zi) +

12n

EX2iEF

primeprime(Zi) +O

(1n32

)

Similarly

EF(Zi +1radicnYi) = EF(Zi) +

1radicn

EYiEFprime(Zi) +

12n

EY2iEF

primeprime(Zi) +O

(1n32

)

Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2

i = EY2i As such the first three

terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic

n) for

iid random variables X1 Xn depends only on the first two moments of the Xi

Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus

EXni = 0 forall1 6 i 6 mn

andmnsumi=1

EX2ni = 1

Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1

EX2ni1|Xni|gtδ = o(1)

as n rarr infin Show that the random variablessummni=1 Xni converge in distribution

as nrarrinfin to a normal variable of mean zero and variance one

Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus

E(Xn|Fnminus1) = 0

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 27

andE(X2

n|Fnminus1) = 1

almost surely Assume also the bounded third moment hypothesis

E(|Xn|3|Fnminus1) 6 C

almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic

nconverge in distribution to a normal variable of mean zero

and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)

Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|

3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that

P(X1 + middot middot middot+Xnradic

n6 t) = P(G 6 t) +O(nminus18)

for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3

i =

EG3 or matching fourth moment EX4i = EG4

One can think of the Lindeberg method as having the schematic form of thetelescoping identity

Xn minus Yn =

nsumi=1

Yiminus1(Xminus Y)Xnminusi

which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity

Xn minus Yn =

int1

0

nsumi=1

((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ

which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)

i denote the random variable

X(θ)i = 1ti6θXi + 1tigtθYi

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

28 Least singular value circular law and Lindeberg exchange

thus for instance X(0)i = Yi and X

(1)i = Xi almost surely One then has the

following key derivative computation

Exercise 309 With the notation and assumptions as above show that(3010)

d

dθEF

(X(θ)1 + middot middot middot+X(θ)

nradicn

)=

nsumi=1

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)where

Z(θ)i =

X(θ)1 + middot middot middot+X(θ)

iminus1 +X(θ)i+1 + middot middot middot+X

(θ)nradic

n

In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ

From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1

0

nsumi=1

EF(Z(θ)i +

1radicnXi) minus EF(Z

(θ)i +

1radicnYi) dθ

Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that

EF

(Z(θ)i +

1radicnXi

)minus EF

(Z(θ)i +

1radicnYi

)= O(nminus32)

thus giving an alternate proof of (304)

Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this

The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic

n(ξij)16ij6n where ξij are real random variables of mean zero

and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that

P(|ξij| gt t) 6 C exp(minusct2)

for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 29

Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The

1radicn

term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])

The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices

We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum

Mn =1radicn

sum(ij)isin∆

ξijEij

where ∆ is the upper triangle

∆ = (i j) 1 6 i 6 j 6 n

and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j

and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas

Gn =sum

(ij)isin∆ηijEij

where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have

Eξkij = Eηkij

for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

30 Least singular value circular law and Lindeberg exchange

If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has

(3012) S(Mn) minus S(Gn) = o(1)

whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4

One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)

2 terms of theform

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)where for each (i j) isin ∆ Mnij is the real symmetric matrix

Mnij =sum

(i primej prime)lt(ij)

1radicnηi primej primeEi primej prime +

sum(i primej prime)gt(ij)

1radicnξi primej primeEi primej prime

where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form

(3013) EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= o(nminus2)

for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))

As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform

s(Mn z) =1n

trR(Mn z)

for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1

denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity

(3014) s(Mn z) =1n

nsumj=1

1λj minus z

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 31

On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity

R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)

whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series

(3015) R(Mn +An z) =infinsumk=0

(minusR(Mn z)An)kR(Mn z)

assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)

s(Mn +An z) = s(Mn z) +infinsumk=1

(minus1)k

ntr(R(Mn z)2An(R(Mn z)An)kminus1

)

To use these identities we invoke the following useful facts about Wigner ma-trices

Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n

minusA) the following statements hold

(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)

(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))

The same claims hold if one replaces one of the entries of Mn together with its transposeby zero

Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion

This yields some control on resolvents

Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1

nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1

nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

32 Least singular value circular law and Lindeberg exchange

to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates

On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η

We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1

nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1

nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1

nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic

nξijEij less than one) we then see from (3016) that

F(Mnij +1radicnξijEij) = F(Mnij)

+

infinsumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)

From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))

k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain

F

(Mnij +

1radicnξijEij

)= F(Mnij)

+

msumk=1

(minusξij)k

n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+Om

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that

EF

(Mnij +

1radicnξijEij

)= EF(Mnij)

+

msumk=1

E(minusξij)k

n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1

)+O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 33

for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that

Eξkij = Eηkij

for all k = 1 m we conclude on subtracting that

EF

(Mnij +

1radicnξijEij

)minus EF

(Mnij +

1radicnηijEij

)= O

(ηminus1nminusm+3

2 +o(1)(1 +1nη

)m+1)

Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)

bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3

ij = Eη3ij then we

may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0

bull If we assume third and fourth matching moments Eξ3ij = Eη3

ij Eξ4ij =

Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a

four moment theorem whenever η gt nminus 1312+ε for some ε gt 0

Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]

One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform

Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1

100k Show that the statistic

E

intRkψ(t1 tk)

kprodl=1

s

(Mn z+

tln

)dt1 dtk

enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl

n ) with their complex conjugates

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

34 Least singular value circular law and Lindeberg exchange

Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint

Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E

sum16i1ltmiddotmiddotmiddotltik6n

F(λi1 λik)

for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1

xminusz we see inparticular that

(3020)int

R

ρ(1)(x)dx

xminus z= nEs(Mn z)

Similarly setting k = 2 and F(x1 x2) = 1x1minusz1

1x2minusz2

+ 1x1minusz2

1x2minusz1

then settingk = 1 and F(x) = 1

(xminusz1)(xminusz2) and adding we see that

2int

R2ρ(2)(x1 x2)

dx1dx2

(x1 minus z1)(x2 minus z2)

+

intR2ρ(1)(x)

dx

(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)

By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have

Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic

1n

intR

ρ(1)(E+s

n)ψ(s) ds

enjoys a four moment theorem

Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic

E

intR

ψ(t)s(MnE+t

n+ iη) dt

enjoys a four moment theorem Applying (3020) we conclude that1n

intR

intR

ρ(1)(x)ψ(t)dxdt

xminus Eminus tn minus iη

enjoys a four moment theorem Making the change of variables x = E+ sn this

becomes1n

intR

ρ(1)(E+s

n)

(intR

ψ(t) dt

sminus tminus inη

)ds

Taking imaginary parts and dividing by π we conclude that

1n

intR

ρ(1)(E+s

n)

(intR

nηψ(t) dt

π((sminus t)2 + (nη)2)

)ds

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

Terence Tao 35

enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη

π((sminust)2+(nη)2)integrates

to one one can establish a bound of the formintR

nηψ(t) dt

π((sminus t)2 + (nη)2)= ψ(s) +O

(nminus1100

1 + s2

)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat

1n

intR

ρ(1)(E+

s

n

) ds

1 + s2 no(1)

(Exercise) The claim now follows from the triangle inequality

Exercise 3022 Justify the two steps marked (Exercise) in the above proof

Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic

1nk

intR

ρ(k)(E1 +

s1

n Ek +

skn

)ψ(s1 sk) ds1 sk

enjoys a four moment theorem

With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]

Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform

Mtn = (1 minus t)12Mn + t12Gn

where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

36 References

to time t = 1 by the formula

Mtn = (1 minus t)12Mn + t12Gn

Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10

that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]

References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices

with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16

[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16

[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16

[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4

[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-

mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28

[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28

10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn

and Mtn are not quite identical however it turns out that the additional error terms caused by this

are negligible when t = nminus1+ε for ε small enough

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

References 37

[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28

[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22

[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16

[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal

matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-

trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16

[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16

[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36

[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16

[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16

[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947

larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian

random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-

lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16

[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13

[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36

[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7

[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36

[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31

[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31

[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31

[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr

[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36

[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

38 References

[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16

[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21

[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16

[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math

(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability

Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner

matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949

larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is

singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related

Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7

[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24

[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr

[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19

[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16

[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13

[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb

4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-

tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622

[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10

[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10

[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12

[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12

[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1

[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9

[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22

[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method

References 39

[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13

[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35

[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35

[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23

[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36

[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36

[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17

[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16

[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16

UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu

  • The least singular value
    • The epsilon-net argument
    • Singularity probability
    • Lower bound for the least singular value
    • Upper bound for the least singular value
    • Asymptotic for the least singular value
      • The circular law
        • Spectral instability
        • Incompleteness of the moment method
        • The logarithmic potential
          • The Lindeberg exchange method