least singular value, circular law, and lindeberg...
TRANSCRIPT
Least singular value circular law and Lindeberg exchange
Terence Tao
Abstract These lectures cover three loosely related topics in random matrix the-ory First we discuss the techniques used to bound the least singular value of (non-Hermitian) random matrices focusing particularly on the matrices with jointlyindependent entries We then use these bounds to obtain the circular law for thespectrum of matrices with iid entries of finite variance Finally we discuss theLindeberg exchange method which allows one to demonstrate universality of manyspectral statistics of matrices (both Hermitian and non-Hermitian)
1 The least singular value
This section1 of the lecture notes is concerned with the behaviour of the leastsingular value σn(M) of an n times n matrix M (or more generally the least non-trivial singular value σp(M) of a ntimesp matrix with p 6 n) This quantity controlsthe invertibility of M Indeed M is invertible precisely when σn(M) is non-zeroand the `2 operator norm Mminus1op of Mminus1 is given by 1σn(M) This quantity isalso related to the condition number σ1(M)σn(M) = MopMminus1op of M whichis of importance in numerical linear algebra As we shall see in Section 2 the leastsingular value of M (and more generally of the shifts 1radic
nMminus zI for complex z)
will be of importance in rigorously establishing the circular law for iid randommatrices M
The least singular value2
σn(M) = infx=1
Mx
which sits at the ldquohard edgerdquo of the spectrum bears a superficial similarity to theoperator norm
Mop = σ1(M) = supx=1
Mx
2010 Mathematics Subject Classification Primary 60B20 Secondary 60F17Key words and phrases Least singular value circular law Lindeberg exchange method random matri-ces1The material in this section and the next is based on that in [56] Thanks to Nick Cook and theanonymous referees for corrections and suggestions The author is supported by NSF grant DMS-1266164 the James and Carol Collins Chair and by a Simons Investigator Award2In these notes we use to denote the `2 norm on Rn or Cn
copy0000 (copyright holder)
2 Least singular value circular law and Lindeberg exchange
at the ldquosoft edgerdquo of the spectrum For strongly rectangular matrices the tech-niques that are useful to control the latter can also control the former but the sit-uation becomes more delicate for square matrices For instance the ldquoepsilon netrdquomethod that is so useful for understanding the operator norm can control someldquolow entropyrdquo portions of the infimum that arise from ldquostructuredrdquo or ldquocompress-iblerdquo choices of x but are not able to control the ldquogenericrdquo or ldquoincompressiblerdquochoices of x for which new arguments will be needed Similarly the momentmethod can give the coarse order of magnitude (for instance for rectangular ma-trices with p = yn for 0 lt y lt 1 it gives an upper bound of (1 minus
radicy+ o(1))n for
the least singular value with high probability thanks to the Marcenko-Pastur law)but again this method begins to break down for square matrices although onecan make some partial headway by considering negative moments such as trMminus2though these are more difficult to compute than positive moments trMk
So one needs to supplement these existing methods with additional tools Itturns out that the key issue is to understand the distance between one of the nrows X1 Xn isin Cn of the matrix M and the hyperplane spanned by the othernminus1 rows The reason for this is as follows First suppose that σn(M) = 0 so thatM is non-invertible and there is a linear dependence between the rows X1 XnThus one of the Xi will lie in the hyperplane spanned by the other rows and soone of the distances mentioned above will vanish in fact one expects many ofthe n distances to vanish Conversely whenever one of these distances vanishesone has a linear dependence and so σn(M) = 0
More generally if the least singular value σn(M) is small one genericallyexpects many of these n distances to be small also and conversely Thus controlof the least singular value is morally equivalent to control of the distance betweena row Xi and the hyperplane spanned by the other rows This latter quantity isbasically the dot product of Xi with a unit normal ni of this hyperplane
When working with random matrices with jointly independent coefficients wehave the crucial property that the unit normal ni (which depends on all the rowsother than Xi) is independent of Xi so even after conditioning ni to be fixed theentries of Xi remain independent As such the dot product Xi middot ni is a familiarscalar random walk and can be controlled by a number of tools most notablyLittlewood-Offord theorems and the Berry-Esseacuteen central limit theorem As itturns out this type of control works well except in some rare cases in whichthe normal ni is ldquocompressiblerdquo or otherwise highly structured but epsilon-netarguments can be used to dispose of these cases3
These methods rely quite strongly on the joint independence on all the entriesit remains a challenge to extend them to more general settings Even for Wignermatrices the methods run into difficulty because of the non-independence of
3This general strategy was first developed for the technically simpler singularity problem in [43] andthen extended to the least singular value problem in [52]
Terence Tao 3
some of the entries (although it turns out one can understand the least singularvalue in such cases by rather different methods)
To simplify the exposition we shall focus primarily here on just one specificensemble of random matrices the Bernoulli ensemble M = (ξij)16i6p16j6n ofrandom sign matrices where ξij = plusmn1 are independent Bernoulli signs Howeverthe results can extend to more general classes of random matrices with the mainrequirement being that the coefficients are jointly independent
Throughout these notes we use X Y Y X or X = O(Y) to denote a boundof the form |X| 6 CY for some absolute constant C we take n as an asymptoticparameter and write X = o(Y) to denote a bound of the form |X| 6 c(n)Y for somequantity c(n) that goes to zero as n goes to infinity (holding other parametersfixed) If C or c(n) needs to depend on additional parameters we will denotethis by subscripts eg X δ Y denotes a bound of the form X gt cδ|Y| for somecδ gt 0
11 The epsilon-net argument We begin by using the epsilon net argument toupper bound the operator norm
Theorem 111 (Upper bound for operator norm) Let M = (ξij)16ij6n16j6n bean ntimes n Bernoulli matrix Then with exponentially high probability (ie 1 minusO(eminuscn)
for some c gt 0) one has
(112) Mop = σ1(M) 6 Cradicn
for some absolute constant C
We remark that the above theorem in fact holds for any C gt 2 this follows forinstance from the work of Geman [32] combined with the Talagrand concentrationinequality (see Exercise 145 below) We will not use this improvement to thistheorem here
Proof We writeMop = sup
xisinRnx=1Mx
thus the failure event Mop gt Cradicn is the union of the events Mx gt C
radicn as
x ranges over the unit sphere One cannot apply the union bound immediatelybecause the number of points on the unit sphere is uncountable Instead we firstdiscretise the unit sphere using the ldquoepsilon net argumentrdquo Let Σ be a maximal12-net of the unit sphere in Rn that is to say a maximal 12-separated subsetof the sphere Observe that if x attains the supremum in the above equation andy is the nearest element of Σ to x then yminus x 6 12 and hence by the triangleinequality
Mop = Mx 6 My+ Mopyminus x
and thusMop 6 sup
xisinΣMx+ 1
2Mop
4 Least singular value circular law and Lindeberg exchange
or equivalentlyMop 6 2 sup
xisinΣMx
and so it suffices to show that
P(supxisinΣMx gt C
2radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx gt C2radicn)
The balls of radius 14 around each point in Σ are disjoint and lie in the 14-neighbourhood of the sphere From volume considerations we conclude that
(113) |Σ| 6 O(1)n
We set aside this bound as an ldquoentropy costrdquo to be paid later and focus on upperbounding for each x isin Σ the probability
P(Mx gt C2radicn)
If we let Y1 Yn isin Rn be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 gtC2
4n
To bound this expression we use the same exponential moment method usedto prove the Chernoff inequality Indeed for any parameter c gt 0 we can useMarkovrsquos inequality to bound the above by
eminuscC2n4E exp(c
nsumj=1
|Yj middot x|2)
As the random vectors Y1 Yn are iid this is equal to
eminuscC2n4
(E exp(c|Y middot x|2)
)n
Using the Chernoff inequality the vectors Y middot x are uniformly subgaussian in thesense that there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular one has
E exp(c|Y middot x|2) 1
if c gt 0 is a sufficiently small absolute constant This implies the bound
P(Mx gt C2radicn) exp(minuscC2n8 +O(n))
Taking C large enough we obtain the claim
Now we use the same method to establish a lower bound in the rectangularcase first established in [4]
Terence Tao 5
Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ
radicn
To prove this theorem we again use the ldquoepsilon net argumentrdquo We write
σp(M) = infxisinRpx=1
Mx
Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have
σp(M) gt infxisinΣMxminus εMop
and thus by (112) we have with exponentially high probability that
σp(M) gt infxisinΣMxminusCε
radicn
and so it suffices to show that
P( infxisinΣMx 6 2Cε
radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx 6 2Cεradicn)
The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that
(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n
We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability
P(Mx 6 2Cεradicn)
If we let Y1 Yn isin Rp be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 6 4C2ε2n
By Markovrsquos inequality the only way that this event can hold is if we have
|Yj middot x|2 6 8C2ε2δ
for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by
6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)
Now observe that the random variables Yj middot x are independent and so we canbound this expression by
6 2nP(|Y middot x| 6radic
8Cεδ12)(1minusδ2)n
where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs
6 Least singular value circular law and Lindeberg exchange
We write x = (x1 xn) so that Y middot x is a random walk
Y middot x = ξ1x1 + middot middot middot+ ξnxn
To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem
Exercise 116 Show4 that
supt
P(|Y middot xminus t| 6 r) r
x+
1x3
nsumj=1
|xj|3
for any r gt 0 and any non-zero xConclude in particular that if sum
j|xj|6ε
|xj|2 gt η
for some η gt 0 thensupt
P(|Y middot xminus t| 6radic
8Cε)η ε
(Hint condition out all the xj with |xj| gt ε)
Let us temporarily call x incompressible ifsumj|xj|6ε
|xj|2 gt η
and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound
P(Mx 6 2Cεradicn) Oη(ε)
(1minusδ2)n
and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)
It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))
in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound
Exercise 117 For any unit vector x show that
P(|Y middot x| 6 κ) 6 1 minus κ
for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2
E(Z2) valid for any non-negative random variable
Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments
4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with
sumnj=1 |xj|
3x3 replaced by (sum3j=1 |xj|
3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3
Terence Tao 7
on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that
P(Mx 6 αradicn) exp(minuscn)
for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n
Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114
12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability
P(σn(M) = 0)
ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound
P(σn(M) = 0) gt1
2n
and it is conjectured that this bound is essentially tight in the sense that
P(σn(M) = 0) =(
12+ o(1)
)n
but this remains open the best bound currently is [18] and gives
P(σn(M) = 0) 6(
1radic2+ o(1)
)n
We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]
Proposition 121 We have P(σn(M) = 0) 1n12
To show this we need the following combinatorial fact due to Erdoumls [25]
Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12
Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most
(nbn2c
) in minus1 1n with the product partial ordering The
claim now easily follows from this theorem and Stirlingrsquos formula
5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order
8 Least singular value circular law and Lindeberg exchange
Note that we also have the obvious bound
(123) P(Y middot x = 0) 6 12
for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11
we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)
(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns
Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise
Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc
(nk
) we may assume that it is the first k entries that are non-zero for some
1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk
) we may assume that it is the top left minor which does this In particular we
can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1
radick))(nminusk) giving a total cost ofsum
16k6εn
bεnc(n
k
)2min(12O(1
radick))minus(nminusk)
which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)
The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E
Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
2 Least singular value circular law and Lindeberg exchange
at the ldquosoft edgerdquo of the spectrum For strongly rectangular matrices the tech-niques that are useful to control the latter can also control the former but the sit-uation becomes more delicate for square matrices For instance the ldquoepsilon netrdquomethod that is so useful for understanding the operator norm can control someldquolow entropyrdquo portions of the infimum that arise from ldquostructuredrdquo or ldquocompress-iblerdquo choices of x but are not able to control the ldquogenericrdquo or ldquoincompressiblerdquochoices of x for which new arguments will be needed Similarly the momentmethod can give the coarse order of magnitude (for instance for rectangular ma-trices with p = yn for 0 lt y lt 1 it gives an upper bound of (1 minus
radicy+ o(1))n for
the least singular value with high probability thanks to the Marcenko-Pastur law)but again this method begins to break down for square matrices although onecan make some partial headway by considering negative moments such as trMminus2though these are more difficult to compute than positive moments trMk
So one needs to supplement these existing methods with additional tools Itturns out that the key issue is to understand the distance between one of the nrows X1 Xn isin Cn of the matrix M and the hyperplane spanned by the othernminus1 rows The reason for this is as follows First suppose that σn(M) = 0 so thatM is non-invertible and there is a linear dependence between the rows X1 XnThus one of the Xi will lie in the hyperplane spanned by the other rows and soone of the distances mentioned above will vanish in fact one expects many ofthe n distances to vanish Conversely whenever one of these distances vanishesone has a linear dependence and so σn(M) = 0
More generally if the least singular value σn(M) is small one genericallyexpects many of these n distances to be small also and conversely Thus controlof the least singular value is morally equivalent to control of the distance betweena row Xi and the hyperplane spanned by the other rows This latter quantity isbasically the dot product of Xi with a unit normal ni of this hyperplane
When working with random matrices with jointly independent coefficients wehave the crucial property that the unit normal ni (which depends on all the rowsother than Xi) is independent of Xi so even after conditioning ni to be fixed theentries of Xi remain independent As such the dot product Xi middot ni is a familiarscalar random walk and can be controlled by a number of tools most notablyLittlewood-Offord theorems and the Berry-Esseacuteen central limit theorem As itturns out this type of control works well except in some rare cases in whichthe normal ni is ldquocompressiblerdquo or otherwise highly structured but epsilon-netarguments can be used to dispose of these cases3
These methods rely quite strongly on the joint independence on all the entriesit remains a challenge to extend them to more general settings Even for Wignermatrices the methods run into difficulty because of the non-independence of
3This general strategy was first developed for the technically simpler singularity problem in [43] andthen extended to the least singular value problem in [52]
Terence Tao 3
some of the entries (although it turns out one can understand the least singularvalue in such cases by rather different methods)
To simplify the exposition we shall focus primarily here on just one specificensemble of random matrices the Bernoulli ensemble M = (ξij)16i6p16j6n ofrandom sign matrices where ξij = plusmn1 are independent Bernoulli signs Howeverthe results can extend to more general classes of random matrices with the mainrequirement being that the coefficients are jointly independent
Throughout these notes we use X Y Y X or X = O(Y) to denote a boundof the form |X| 6 CY for some absolute constant C we take n as an asymptoticparameter and write X = o(Y) to denote a bound of the form |X| 6 c(n)Y for somequantity c(n) that goes to zero as n goes to infinity (holding other parametersfixed) If C or c(n) needs to depend on additional parameters we will denotethis by subscripts eg X δ Y denotes a bound of the form X gt cδ|Y| for somecδ gt 0
11 The epsilon-net argument We begin by using the epsilon net argument toupper bound the operator norm
Theorem 111 (Upper bound for operator norm) Let M = (ξij)16ij6n16j6n bean ntimes n Bernoulli matrix Then with exponentially high probability (ie 1 minusO(eminuscn)
for some c gt 0) one has
(112) Mop = σ1(M) 6 Cradicn
for some absolute constant C
We remark that the above theorem in fact holds for any C gt 2 this follows forinstance from the work of Geman [32] combined with the Talagrand concentrationinequality (see Exercise 145 below) We will not use this improvement to thistheorem here
Proof We writeMop = sup
xisinRnx=1Mx
thus the failure event Mop gt Cradicn is the union of the events Mx gt C
radicn as
x ranges over the unit sphere One cannot apply the union bound immediatelybecause the number of points on the unit sphere is uncountable Instead we firstdiscretise the unit sphere using the ldquoepsilon net argumentrdquo Let Σ be a maximal12-net of the unit sphere in Rn that is to say a maximal 12-separated subsetof the sphere Observe that if x attains the supremum in the above equation andy is the nearest element of Σ to x then yminus x 6 12 and hence by the triangleinequality
Mop = Mx 6 My+ Mopyminus x
and thusMop 6 sup
xisinΣMx+ 1
2Mop
4 Least singular value circular law and Lindeberg exchange
or equivalentlyMop 6 2 sup
xisinΣMx
and so it suffices to show that
P(supxisinΣMx gt C
2radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx gt C2radicn)
The balls of radius 14 around each point in Σ are disjoint and lie in the 14-neighbourhood of the sphere From volume considerations we conclude that
(113) |Σ| 6 O(1)n
We set aside this bound as an ldquoentropy costrdquo to be paid later and focus on upperbounding for each x isin Σ the probability
P(Mx gt C2radicn)
If we let Y1 Yn isin Rn be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 gtC2
4n
To bound this expression we use the same exponential moment method usedto prove the Chernoff inequality Indeed for any parameter c gt 0 we can useMarkovrsquos inequality to bound the above by
eminuscC2n4E exp(c
nsumj=1
|Yj middot x|2)
As the random vectors Y1 Yn are iid this is equal to
eminuscC2n4
(E exp(c|Y middot x|2)
)n
Using the Chernoff inequality the vectors Y middot x are uniformly subgaussian in thesense that there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular one has
E exp(c|Y middot x|2) 1
if c gt 0 is a sufficiently small absolute constant This implies the bound
P(Mx gt C2radicn) exp(minuscC2n8 +O(n))
Taking C large enough we obtain the claim
Now we use the same method to establish a lower bound in the rectangularcase first established in [4]
Terence Tao 5
Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ
radicn
To prove this theorem we again use the ldquoepsilon net argumentrdquo We write
σp(M) = infxisinRpx=1
Mx
Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have
σp(M) gt infxisinΣMxminus εMop
and thus by (112) we have with exponentially high probability that
σp(M) gt infxisinΣMxminusCε
radicn
and so it suffices to show that
P( infxisinΣMx 6 2Cε
radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx 6 2Cεradicn)
The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that
(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n
We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability
P(Mx 6 2Cεradicn)
If we let Y1 Yn isin Rp be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 6 4C2ε2n
By Markovrsquos inequality the only way that this event can hold is if we have
|Yj middot x|2 6 8C2ε2δ
for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by
6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)
Now observe that the random variables Yj middot x are independent and so we canbound this expression by
6 2nP(|Y middot x| 6radic
8Cεδ12)(1minusδ2)n
where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs
6 Least singular value circular law and Lindeberg exchange
We write x = (x1 xn) so that Y middot x is a random walk
Y middot x = ξ1x1 + middot middot middot+ ξnxn
To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem
Exercise 116 Show4 that
supt
P(|Y middot xminus t| 6 r) r
x+
1x3
nsumj=1
|xj|3
for any r gt 0 and any non-zero xConclude in particular that if sum
j|xj|6ε
|xj|2 gt η
for some η gt 0 thensupt
P(|Y middot xminus t| 6radic
8Cε)η ε
(Hint condition out all the xj with |xj| gt ε)
Let us temporarily call x incompressible ifsumj|xj|6ε
|xj|2 gt η
and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound
P(Mx 6 2Cεradicn) Oη(ε)
(1minusδ2)n
and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)
It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))
in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound
Exercise 117 For any unit vector x show that
P(|Y middot x| 6 κ) 6 1 minus κ
for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2
E(Z2) valid for any non-negative random variable
Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments
4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with
sumnj=1 |xj|
3x3 replaced by (sum3j=1 |xj|
3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3
Terence Tao 7
on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that
P(Mx 6 αradicn) exp(minuscn)
for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n
Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114
12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability
P(σn(M) = 0)
ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound
P(σn(M) = 0) gt1
2n
and it is conjectured that this bound is essentially tight in the sense that
P(σn(M) = 0) =(
12+ o(1)
)n
but this remains open the best bound currently is [18] and gives
P(σn(M) = 0) 6(
1radic2+ o(1)
)n
We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]
Proposition 121 We have P(σn(M) = 0) 1n12
To show this we need the following combinatorial fact due to Erdoumls [25]
Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12
Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most
(nbn2c
) in minus1 1n with the product partial ordering The
claim now easily follows from this theorem and Stirlingrsquos formula
5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order
8 Least singular value circular law and Lindeberg exchange
Note that we also have the obvious bound
(123) P(Y middot x = 0) 6 12
for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11
we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)
(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns
Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise
Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc
(nk
) we may assume that it is the first k entries that are non-zero for some
1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk
) we may assume that it is the top left minor which does this In particular we
can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1
radick))(nminusk) giving a total cost ofsum
16k6εn
bεnc(n
k
)2min(12O(1
radick))minus(nminusk)
which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)
The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E
Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 3
some of the entries (although it turns out one can understand the least singularvalue in such cases by rather different methods)
To simplify the exposition we shall focus primarily here on just one specificensemble of random matrices the Bernoulli ensemble M = (ξij)16i6p16j6n ofrandom sign matrices where ξij = plusmn1 are independent Bernoulli signs Howeverthe results can extend to more general classes of random matrices with the mainrequirement being that the coefficients are jointly independent
Throughout these notes we use X Y Y X or X = O(Y) to denote a boundof the form |X| 6 CY for some absolute constant C we take n as an asymptoticparameter and write X = o(Y) to denote a bound of the form |X| 6 c(n)Y for somequantity c(n) that goes to zero as n goes to infinity (holding other parametersfixed) If C or c(n) needs to depend on additional parameters we will denotethis by subscripts eg X δ Y denotes a bound of the form X gt cδ|Y| for somecδ gt 0
11 The epsilon-net argument We begin by using the epsilon net argument toupper bound the operator norm
Theorem 111 (Upper bound for operator norm) Let M = (ξij)16ij6n16j6n bean ntimes n Bernoulli matrix Then with exponentially high probability (ie 1 minusO(eminuscn)
for some c gt 0) one has
(112) Mop = σ1(M) 6 Cradicn
for some absolute constant C
We remark that the above theorem in fact holds for any C gt 2 this follows forinstance from the work of Geman [32] combined with the Talagrand concentrationinequality (see Exercise 145 below) We will not use this improvement to thistheorem here
Proof We writeMop = sup
xisinRnx=1Mx
thus the failure event Mop gt Cradicn is the union of the events Mx gt C
radicn as
x ranges over the unit sphere One cannot apply the union bound immediatelybecause the number of points on the unit sphere is uncountable Instead we firstdiscretise the unit sphere using the ldquoepsilon net argumentrdquo Let Σ be a maximal12-net of the unit sphere in Rn that is to say a maximal 12-separated subsetof the sphere Observe that if x attains the supremum in the above equation andy is the nearest element of Σ to x then yminus x 6 12 and hence by the triangleinequality
Mop = Mx 6 My+ Mopyminus x
and thusMop 6 sup
xisinΣMx+ 1
2Mop
4 Least singular value circular law and Lindeberg exchange
or equivalentlyMop 6 2 sup
xisinΣMx
and so it suffices to show that
P(supxisinΣMx gt C
2radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx gt C2radicn)
The balls of radius 14 around each point in Σ are disjoint and lie in the 14-neighbourhood of the sphere From volume considerations we conclude that
(113) |Σ| 6 O(1)n
We set aside this bound as an ldquoentropy costrdquo to be paid later and focus on upperbounding for each x isin Σ the probability
P(Mx gt C2radicn)
If we let Y1 Yn isin Rn be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 gtC2
4n
To bound this expression we use the same exponential moment method usedto prove the Chernoff inequality Indeed for any parameter c gt 0 we can useMarkovrsquos inequality to bound the above by
eminuscC2n4E exp(c
nsumj=1
|Yj middot x|2)
As the random vectors Y1 Yn are iid this is equal to
eminuscC2n4
(E exp(c|Y middot x|2)
)n
Using the Chernoff inequality the vectors Y middot x are uniformly subgaussian in thesense that there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular one has
E exp(c|Y middot x|2) 1
if c gt 0 is a sufficiently small absolute constant This implies the bound
P(Mx gt C2radicn) exp(minuscC2n8 +O(n))
Taking C large enough we obtain the claim
Now we use the same method to establish a lower bound in the rectangularcase first established in [4]
Terence Tao 5
Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ
radicn
To prove this theorem we again use the ldquoepsilon net argumentrdquo We write
σp(M) = infxisinRpx=1
Mx
Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have
σp(M) gt infxisinΣMxminus εMop
and thus by (112) we have with exponentially high probability that
σp(M) gt infxisinΣMxminusCε
radicn
and so it suffices to show that
P( infxisinΣMx 6 2Cε
radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx 6 2Cεradicn)
The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that
(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n
We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability
P(Mx 6 2Cεradicn)
If we let Y1 Yn isin Rp be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 6 4C2ε2n
By Markovrsquos inequality the only way that this event can hold is if we have
|Yj middot x|2 6 8C2ε2δ
for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by
6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)
Now observe that the random variables Yj middot x are independent and so we canbound this expression by
6 2nP(|Y middot x| 6radic
8Cεδ12)(1minusδ2)n
where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs
6 Least singular value circular law and Lindeberg exchange
We write x = (x1 xn) so that Y middot x is a random walk
Y middot x = ξ1x1 + middot middot middot+ ξnxn
To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem
Exercise 116 Show4 that
supt
P(|Y middot xminus t| 6 r) r
x+
1x3
nsumj=1
|xj|3
for any r gt 0 and any non-zero xConclude in particular that if sum
j|xj|6ε
|xj|2 gt η
for some η gt 0 thensupt
P(|Y middot xminus t| 6radic
8Cε)η ε
(Hint condition out all the xj with |xj| gt ε)
Let us temporarily call x incompressible ifsumj|xj|6ε
|xj|2 gt η
and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound
P(Mx 6 2Cεradicn) Oη(ε)
(1minusδ2)n
and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)
It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))
in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound
Exercise 117 For any unit vector x show that
P(|Y middot x| 6 κ) 6 1 minus κ
for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2
E(Z2) valid for any non-negative random variable
Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments
4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with
sumnj=1 |xj|
3x3 replaced by (sum3j=1 |xj|
3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3
Terence Tao 7
on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that
P(Mx 6 αradicn) exp(minuscn)
for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n
Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114
12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability
P(σn(M) = 0)
ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound
P(σn(M) = 0) gt1
2n
and it is conjectured that this bound is essentially tight in the sense that
P(σn(M) = 0) =(
12+ o(1)
)n
but this remains open the best bound currently is [18] and gives
P(σn(M) = 0) 6(
1radic2+ o(1)
)n
We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]
Proposition 121 We have P(σn(M) = 0) 1n12
To show this we need the following combinatorial fact due to Erdoumls [25]
Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12
Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most
(nbn2c
) in minus1 1n with the product partial ordering The
claim now easily follows from this theorem and Stirlingrsquos formula
5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order
8 Least singular value circular law and Lindeberg exchange
Note that we also have the obvious bound
(123) P(Y middot x = 0) 6 12
for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11
we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)
(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns
Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise
Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc
(nk
) we may assume that it is the first k entries that are non-zero for some
1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk
) we may assume that it is the top left minor which does this In particular we
can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1
radick))(nminusk) giving a total cost ofsum
16k6εn
bεnc(n
k
)2min(12O(1
radick))minus(nminusk)
which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)
The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E
Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
4 Least singular value circular law and Lindeberg exchange
or equivalentlyMop 6 2 sup
xisinΣMx
and so it suffices to show that
P(supxisinΣMx gt C
2radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx gt C2radicn)
The balls of radius 14 around each point in Σ are disjoint and lie in the 14-neighbourhood of the sphere From volume considerations we conclude that
(113) |Σ| 6 O(1)n
We set aside this bound as an ldquoentropy costrdquo to be paid later and focus on upperbounding for each x isin Σ the probability
P(Mx gt C2radicn)
If we let Y1 Yn isin Rn be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 gtC2
4n
To bound this expression we use the same exponential moment method usedto prove the Chernoff inequality Indeed for any parameter c gt 0 we can useMarkovrsquos inequality to bound the above by
eminuscC2n4E exp(c
nsumj=1
|Yj middot x|2)
As the random vectors Y1 Yn are iid this is equal to
eminuscC2n4
(E exp(c|Y middot x|2)
)n
Using the Chernoff inequality the vectors Y middot x are uniformly subgaussian in thesense that there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular one has
E exp(c|Y middot x|2) 1
if c gt 0 is a sufficiently small absolute constant This implies the bound
P(Mx gt C2radicn) exp(minuscC2n8 +O(n))
Taking C large enough we obtain the claim
Now we use the same method to establish a lower bound in the rectangularcase first established in [4]
Terence Tao 5
Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ
radicn
To prove this theorem we again use the ldquoepsilon net argumentrdquo We write
σp(M) = infxisinRpx=1
Mx
Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have
σp(M) gt infxisinΣMxminus εMop
and thus by (112) we have with exponentially high probability that
σp(M) gt infxisinΣMxminusCε
radicn
and so it suffices to show that
P( infxisinΣMx 6 2Cε
radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx 6 2Cεradicn)
The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that
(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n
We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability
P(Mx 6 2Cεradicn)
If we let Y1 Yn isin Rp be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 6 4C2ε2n
By Markovrsquos inequality the only way that this event can hold is if we have
|Yj middot x|2 6 8C2ε2δ
for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by
6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)
Now observe that the random variables Yj middot x are independent and so we canbound this expression by
6 2nP(|Y middot x| 6radic
8Cεδ12)(1minusδ2)n
where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs
6 Least singular value circular law and Lindeberg exchange
We write x = (x1 xn) so that Y middot x is a random walk
Y middot x = ξ1x1 + middot middot middot+ ξnxn
To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem
Exercise 116 Show4 that
supt
P(|Y middot xminus t| 6 r) r
x+
1x3
nsumj=1
|xj|3
for any r gt 0 and any non-zero xConclude in particular that if sum
j|xj|6ε
|xj|2 gt η
for some η gt 0 thensupt
P(|Y middot xminus t| 6radic
8Cε)η ε
(Hint condition out all the xj with |xj| gt ε)
Let us temporarily call x incompressible ifsumj|xj|6ε
|xj|2 gt η
and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound
P(Mx 6 2Cεradicn) Oη(ε)
(1minusδ2)n
and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)
It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))
in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound
Exercise 117 For any unit vector x show that
P(|Y middot x| 6 κ) 6 1 minus κ
for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2
E(Z2) valid for any non-negative random variable
Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments
4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with
sumnj=1 |xj|
3x3 replaced by (sum3j=1 |xj|
3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3
Terence Tao 7
on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that
P(Mx 6 αradicn) exp(minuscn)
for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n
Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114
12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability
P(σn(M) = 0)
ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound
P(σn(M) = 0) gt1
2n
and it is conjectured that this bound is essentially tight in the sense that
P(σn(M) = 0) =(
12+ o(1)
)n
but this remains open the best bound currently is [18] and gives
P(σn(M) = 0) 6(
1radic2+ o(1)
)n
We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]
Proposition 121 We have P(σn(M) = 0) 1n12
To show this we need the following combinatorial fact due to Erdoumls [25]
Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12
Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most
(nbn2c
) in minus1 1n with the product partial ordering The
claim now easily follows from this theorem and Stirlingrsquos formula
5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order
8 Least singular value circular law and Lindeberg exchange
Note that we also have the obvious bound
(123) P(Y middot x = 0) 6 12
for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11
we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)
(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns
Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise
Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc
(nk
) we may assume that it is the first k entries that are non-zero for some
1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk
) we may assume that it is the top left minor which does this In particular we
can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1
radick))(nminusk) giving a total cost ofsum
16k6εn
bεnc(n
k
)2min(12O(1
radick))minus(nminusk)
which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)
The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E
Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 5
Theorem 114 (Lower bound) LetM = (ξij)16i6p16j6n be an ntimesp Bernoulli ma-trix where 1 6 p 6 (1minus δ)n for some δ gt 0 (independent of n) Then with exponentiallyhigh probability one has σp(M)δ
radicn
To prove this theorem we again use the ldquoepsilon net argumentrdquo We write
σp(M) = infxisinRpx=1
Mx
Let ε gt 0 be a parameter to be chosen later Let Σ be a maximal ε-net of the unitsphere in Rp Then we have
σp(M) gt infxisinΣMxminus εMop
and thus by (112) we have with exponentially high probability that
σp(M) gt infxisinΣMxminusCε
radicn
and so it suffices to show that
P( infxisinΣMx 6 2Cε
radicn)
is exponentially small in n From the union bound we can upper bound this bysumxisinΣ
P(Mx 6 2Cεradicn)
The balls of radius ε2 around each point in Σ are disjoint and lie in the ε2-neighbourhood of the sphere From volume considerations we conclude that
(115) |Σ| 6 O(1ε)p 6 O(1ε)(1minusδ)n
We again set aside this bound as an ldquoentropy costrdquo to be paid later and focus onupper bounding for each x isin Σ the probability
P(Mx 6 2Cεradicn)
If we let Y1 Yn isin Rp be the rows of M we can write this as
P
nsumj=1
|Yj middot x|2 6 4C2ε2n
By Markovrsquos inequality the only way that this event can hold is if we have
|Yj middot x|2 6 8C2ε2δ
for at least (1 minus δ2)n values of j The number of such sets of values is at most2n Applying the union bound (and paying the entropy cost of 2n) and usingsymmetry we may thus bound the above probability by
6 2nP(|Yj middot x|2 6 8C2ε2δ for 1 6 j 6 (1 minus δ2)n)
Now observe that the random variables Yj middot x are independent and so we canbound this expression by
6 2nP(|Y middot x| 6radic
8Cεδ12)(1minusδ2)n
where Y = (ξ1 ξn) is a random vector of iid Bernoulli signs
6 Least singular value circular law and Lindeberg exchange
We write x = (x1 xn) so that Y middot x is a random walk
Y middot x = ξ1x1 + middot middot middot+ ξnxn
To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem
Exercise 116 Show4 that
supt
P(|Y middot xminus t| 6 r) r
x+
1x3
nsumj=1
|xj|3
for any r gt 0 and any non-zero xConclude in particular that if sum
j|xj|6ε
|xj|2 gt η
for some η gt 0 thensupt
P(|Y middot xminus t| 6radic
8Cε)η ε
(Hint condition out all the xj with |xj| gt ε)
Let us temporarily call x incompressible ifsumj|xj|6ε
|xj|2 gt η
and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound
P(Mx 6 2Cεradicn) Oη(ε)
(1minusδ2)n
and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)
It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))
in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound
Exercise 117 For any unit vector x show that
P(|Y middot x| 6 κ) 6 1 minus κ
for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2
E(Z2) valid for any non-negative random variable
Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments
4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with
sumnj=1 |xj|
3x3 replaced by (sum3j=1 |xj|
3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3
Terence Tao 7
on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that
P(Mx 6 αradicn) exp(minuscn)
for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n
Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114
12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability
P(σn(M) = 0)
ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound
P(σn(M) = 0) gt1
2n
and it is conjectured that this bound is essentially tight in the sense that
P(σn(M) = 0) =(
12+ o(1)
)n
but this remains open the best bound currently is [18] and gives
P(σn(M) = 0) 6(
1radic2+ o(1)
)n
We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]
Proposition 121 We have P(σn(M) = 0) 1n12
To show this we need the following combinatorial fact due to Erdoumls [25]
Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12
Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most
(nbn2c
) in minus1 1n with the product partial ordering The
claim now easily follows from this theorem and Stirlingrsquos formula
5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order
8 Least singular value circular law and Lindeberg exchange
Note that we also have the obvious bound
(123) P(Y middot x = 0) 6 12
for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11
we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)
(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns
Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise
Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc
(nk
) we may assume that it is the first k entries that are non-zero for some
1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk
) we may assume that it is the top left minor which does this In particular we
can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1
radick))(nminusk) giving a total cost ofsum
16k6εn
bεnc(n
k
)2min(12O(1
radick))minus(nminusk)
which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)
The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E
Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
6 Least singular value circular law and Lindeberg exchange
We write x = (x1 xn) so that Y middot x is a random walk
Y middot x = ξ1x1 + middot middot middot+ ξnxn
To understand this walk we apply (a slight variant) of the Berry-Esseacuteen theorem
Exercise 116 Show4 that
supt
P(|Y middot xminus t| 6 r) r
x+
1x3
nsumj=1
|xj|3
for any r gt 0 and any non-zero xConclude in particular that if sum
j|xj|6ε
|xj|2 gt η
for some η gt 0 thensupt
P(|Y middot xminus t| 6radic
8Cε)η ε
(Hint condition out all the xj with |xj| gt ε)
Let us temporarily call x incompressible ifsumj|xj|6ε
|xj|2 gt η
and compressible otherwise where η gt 0 is a parameter (independent of n) to bechosen later If we only look at the incompressible elements of Σ we can nowbound
P(Mx 6 2Cεradicn) Oη(ε)
(1minusδ2)n
and comparing this against the entropy cost (115) we obtain an acceptable contri-bution for ε small enough (here we are crucially using the rectangular conditionp 6 (1 minus δ)n)
It remains to deal with the compressible vectors Observe that such vectors liewithin η of a sparse unit vector which is only supported in at most εminus2 positionsThe η-entropy of these sparse vectors (ie the number of balls of radius η neededto cover this space) can easily be computed to be of polynomial size O(nOεη(1))
in n thus if η is chosen to be less than ε2 the number of compressible vectorsin Σ is also O(nOεη(1)) Meanwhile we have the following crude bound
Exercise 117 For any unit vector x show that
P(|Y middot x| 6 κ) 6 1 minus κ
for κ gt 0 a small enough absolute constant (Hint Use the Paley-Zygmund in-equality P(Z gt θEZ) gt (1minus θ)2 (EZ)2
E(Z2) valid for any non-negative random variable
Z of finite non-zero variance and any 0 6 θ 6 1 Bounds on higher moments
4Actually for the purposes of this section it would suffice to establish a weaker form of the Berry-Esseacuteen theorem with
sumnj=1 |xj|
3x3 replaced by (sum3j=1 |xj|
3x3)c for any fixed c gt 0 This canfor instance be done using the Lindeberg exchange method discussed in Section 3
Terence Tao 7
on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that
P(Mx 6 αradicn) exp(minuscn)
for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n
Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114
12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability
P(σn(M) = 0)
ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound
P(σn(M) = 0) gt1
2n
and it is conjectured that this bound is essentially tight in the sense that
P(σn(M) = 0) =(
12+ o(1)
)n
but this remains open the best bound currently is [18] and gives
P(σn(M) = 0) 6(
1radic2+ o(1)
)n
We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]
Proposition 121 We have P(σn(M) = 0) 1n12
To show this we need the following combinatorial fact due to Erdoumls [25]
Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12
Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most
(nbn2c
) in minus1 1n with the product partial ordering The
claim now easily follows from this theorem and Stirlingrsquos formula
5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order
8 Least singular value circular law and Lindeberg exchange
Note that we also have the obvious bound
(123) P(Y middot x = 0) 6 12
for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11
we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)
(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns
Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise
Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc
(nk
) we may assume that it is the first k entries that are non-zero for some
1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk
) we may assume that it is the top left minor which does this In particular we
can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1
radick))(nminusk) giving a total cost ofsum
16k6εn
bεnc(n
k
)2min(12O(1
radick))minus(nminusk)
which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)
The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E
Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 7
on |Y middot x| can be obtained for instance using Hoeffdingrsquos inequality or by directcomputation) Use this to show that
P(Mx 6 αradicn) exp(minuscn)
for all such x and a sufficiently small absolute constant α gt 0 with c gt 0 inde-pendent of α and n
Thus the compressible vectors give a net contribution ofO(nOεη(1))timesexp(minuscn)which is acceptable This concludes the proof of Theorem 114
12 Singularity probability Now we turn to square iid matrices Before weinvestigate the size of the least singular value of M we first tackle the easierproblem of bounding the singularity probability
P(σn(M) = 0)
ie the probability that M is not invertible The problem of computing thisprobability exactly is still not completely settled Since M is singular wheneverthe first two rows (say) are identical we obtain a lower bound
P(σn(M) = 0) gt1
2n
and it is conjectured that this bound is essentially tight in the sense that
P(σn(M) = 0) =(
12+ o(1)
)n
but this remains open the best bound currently is [18] and gives
P(σn(M) = 0) 6(
1radic2+ o(1)
)n
We will not prove this bound here but content ourselves with a weaker boundessentially due to Komloacutes [43]
Proposition 121 We have P(σn(M) = 0) 1n12
To show this we need the following combinatorial fact due to Erdoumls [25]
Proposition 122 (Erdoumls Littlewood-Offord theorem) Let x = (x1 xn) be avector with at least k nonzero entries and let Y = (ξ1 ξn) be a random vector of iidBernoulli signs Then P(Y middot x = 0) kminus12
Proof By taking real and imaginary parts we may assume that x is real Byeliminating zero coefficients of x we may assume that k = n reflecting we maythen assume that all the xi are positive Observe that the set of Y = (ξ1 ξn) isinminus1 1n with Y middot x = 0 forms an antichain5 The product partial ordering onminus1 1n is defined by requiring (x1 xn) 6 (y1 yn) iff xi 6 yi for all iOn the other hand Spernerrsquos theorem asserts that all anti-chains in minus1 1n havecardinality at most
(nbn2c
) in minus1 1n with the product partial ordering The
claim now easily follows from this theorem and Stirlingrsquos formula
5An antichain in a partially ordered set X is a subset S of X such that no two elements in S arecomparable in the order
8 Least singular value circular law and Lindeberg exchange
Note that we also have the obvious bound
(123) P(Y middot x = 0) 6 12
for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11
we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)
(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns
Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise
Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc
(nk
) we may assume that it is the first k entries that are non-zero for some
1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk
) we may assume that it is the top left minor which does this In particular we
can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1
radick))(nminusk) giving a total cost ofsum
16k6εn
bεnc(n
k
)2min(12O(1
radick))minus(nminusk)
which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)
The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E
Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
8 Least singular value circular law and Lindeberg exchange
Note that we also have the obvious bound
(123) P(Y middot x = 0) 6 12
for any non-zero xNow we prove the proposition In analogy with the arguments of Section 11
we writeP(σn(M) = 0) = P(Mx = 0 for some nonzero x isin Cn)
(actually we can take x isin Rn or even x isin Zn since M is integer-valued) Wedivide into compressible and incompressible vectors as before but our definitionof compressibility and incompressibility is slightly different now Also one hasto do a certain amount of technical maneuvering in order to preserve the crucialindependence between rows and columns
Namely we pick an ε gt 0 and call x compressible (or more precisely sparse) if itis supported on at most εn coordinates and incompressible otherwise
Let us first consider the contribution of the event thatMx = 0 for some nonzerocompressible x Pick an x with this property which is as sparse as possible sayk-sparse for some 1 6 k lt εn Let us temporarily fix k By paying an entropy costof bεnc
(nk
) we may assume that it is the first k entries that are non-zero for some
1 6 k 6 εn This implies that the first k columns Y1 Yk of M have a linear de-pendence given by x by minimality Y1 Ykminus1 are linearly independent Thusx is uniquely determined (up to scalar multiples) by Y1 Yk Furthermore asthe ntimes k matrix formed by Y1 Yk has rank kminus 1 there is some ktimes k minorwhich already determines x up to constants by paying another entropy cost of(nk
) we may assume that it is the top left minor which does this In particular we
can now use the first k rows X1 Xk to determine x up to constants But theremaining nminus k rows are independent of X1 Xk and still need to be orthog-onal to x by Proposition 122and (123) this happens with probability at mostmin(12O(1
radick))(nminusk) giving a total cost ofsum
16k6εn
bεnc(n
k
)2min(12O(1
radick))minus(nminusk)
which by Stirlingrsquos formula is acceptable (in fact this gives an exponentially smallcontribution)
The same argument gives that the event that ylowastM = 0 for some nonzero com-pressible y also has exponentially small probability The only remaining event tocontrol is the event that Mx = 0 for some incompressible x but that Mz 6= 0 andylowastM 6= 0 for all nonzero compressible zy Call this event E
Since Mx = 0 for some incompressible x we see that for at least εn values ofk isin 1 n the column Yk lies in the vector space Vk spanned by the remainingnminus 1 rows of M Let Ek denote the event that E holds and that Yk lies in Vk
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 9
then we see from double counting that
P(E) 61εn
nsumk=1
P(Ek)
By symmetry we thus haveP(E) 6
1ε
P(En)
To compute P(En) we freeze Y1 Ynminus1 consider a normal vector x to Vnminus1note that we can select x depending only on Y1 Ynminus1 We may assume thatan incompressible normal vector exists since otherwise the event En would beempty We make the crucial observation that Yn is still independent of x ByProposition 122 we thus see that the conditional probability that Yn middot x = 0 forfixed Y1 Ynminus1 is Oε(nminus12) We thus see that P(E)ε 1n12 and the claimfollows
Remark 124 Further progress has been made on this problem by a finer anal-ysis of the concentration probability P(Y middot x = 0) and in particular in classifyingthose x for which this concentration probability is large (this is known as the in-verse Littlewood-Offord problem) Important breakthroughs in this direction weremade by Halaacutesz [38] (introducing Fourier-analytic tools) and by Kahn Komloacutesand Szemereacutedi [40] (introducing an efficient ldquoswappingrdquo argument) In [57] toolsfrom additive combinatorics (such as Freimanrsquos theorem) were introduced to ob-tain further improvements leading eventually to the results from [18] mentionedearlier
13 Lower bound for the least singular value Now we return to the least singu-lar value σn(M) of an iid Bernoulli matrix and establish a lower bound Giventhat there are n singular values between 0 and σ1(M) which is typically of sizeO(radicn) one expects the least singular value to be of size about 1
radicn on the av-
erage Another argument supporting this heuristic scomes from the followingidentity
Exercise 131 (Negative second moment identity) Let M be an invertible ntimes nmatrix let X1 Xn be the rows of M and let C1 Cn be the columns ofMminus1 For each 1 6 i 6 n let Vi be the hyperplane spanned by all the rowsX1 Xn other than Xi Show that Ci = dist(XiVi)minus1 and
sumni=1 σi(M)minus2 =sumn
i=1 dist(XiVi)minus2
The expression dist(XiVi)2 has a mean comparable to 1 which suggests thatdist(XiVi)minus2 is typically of size O(1) which from the negative second momentidentity suggests that
sumni=1 σi(M)minus2 = O(n) this is consistent with the heuris-
tic that the eigenvalues σi(M) should be roughly evenly spaced in the interval[0 2radicn] (so that σnminusi(M) should be about (i+ 1)
radicn)
Now we give a rigorous lower bound
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
10 Least singular value circular law and Lindeberg exchange
Theorem 132 (Lower tail estimate for the least singular value) For any λ gt 0 onehas
P(σn(M) 6 λradicn) oλrarr0(1) + onrarrinfinλ(1)
where oλrarr0(1) goes to zero as λ rarr 0 uniformly in n and onrarrinfinλ(1) goes to zero asnrarrinfin for each fixed λ
This is a weaker form of a result of Rudelson and Vershynin [53] (which obtainsa bound of the form O(λ) +O(cn) for some c lt 1) which builds upon the earlierworks [52] [59] which obtained variants of the above result
The scale 1radicn that we are working at here is too fine to use epsilon net ar-
guments (unless one has a lot of control on the entropy which can be obtainedin some cases thanks to powerful inverse Littlewood-Offord theorems but is dif-ficult to obtain in general) We can prove this theorem along similar lines to thearguments in the previous section we sketch the method as follows We can takeλ to be small We write the probability to be estimated as
P(Mx 6 λradicn for some unit vector x isin Cn)
We can assume that Mop 6 Cradicn for some absolute constant C as the event
that this fails has exponentially small probabilityWe pick an ε gt 0 (not depending on λ) to be chosen later We call a unit vector
x isin Cn compressible if x lies within a distance ε of a εn-sparse vector Let us firstdispose of the case in which Mx 6 λ
radicn for some compressible x In fact we
can even dispose of the larger event that Mx 6 λradicn for some compressible x
By paying an entropy cost of(nbεnc
) we may assume that x is within ε of a vector
y supported in the first bεnc coordinates Using the operator norm bound on Mand the triangle inequality we conclude that
My 6 (λ+Cε)radicn
Since y has norm comparable to 1 this implies that the least singular value ofthe first bεnc columns of M is O((λ+ ε)
radicn) But by Theorem 114 this occurs
with probability O(exp(minuscn)) (if λ ε are small enough) So the total probabilityof the compressible event is at most
(nbεnc
)O(exp(minuscn)) which is acceptable if ε
is small enoughThus we may assume now that Mx gt λ
radicn for all compressible unit vectors
x we may similarly assume that ylowastM gt λradicn for all compressible unit vectors
y Indeed we may also assume that ylowastMi gt λradicn for every i where Mi is M
with the ith column removedThe remaining case is if Mx 6 λ
radicn for some incompressible x Let us call
this event E Write x = (x1 xn) and let Y1 Yn be the column of M thus
x1Y1 + middot middot middot+ xnYn 6 λradicn
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 11
Letting Wi be the subspace spanned by all the Y1 Yn except for Yi we con-clude upon projecting to the orthogonal complement of Wi that
|xi|dist(YiWi) 6 λradicn
for all i (compare with Exercise 131) On the other hand since x is incompress-ible we see that |xi| gt ε
radicn for at least εn values of i and thus
(133) dist(YiWi) 6 λε
for at least εn values of i If we let Ei be the event that E and (133) both holdwe thus have from double-counting that
P(E) 61εn
nsumi=1
P(Ei)
and thus by symmetryP(E) 6
1ε
P(En)
(say) However if En holds then setting y to be a unit normal vector toWi (whichis necessarily incompressible by the hypothesis on Mi) we have
|Yi middot y| 6 λε
Again the crucial point is that Yi and y are independent The incompressibilityof y combined with a Berry-Esseacuteen type theorem then gives
Exercise 134 Show thatP(|Yi middot y| 6 λε) ε2
(say) if λ is sufficiently small depending on ε and n is sufficiently large depend-ing on ε
This gives a bound of O(ε) for P(E) if λ is small enough depending on ε andn is large enough this gives the claim
Remark 135 By refining these arguments one can obtain an estimate of the form
(136) σn(1radicnMn minus zI) gt nminusA
for any givenA gt 0 and z of polynomial size in n with a failure probability that isof the formO(nminusB) for some B gt 0 By using inverse Littlewood-Offord theoremsrather than the Berry-Esseacuteen theorem one can make B arbitrarily large (if A issufficiently large depending on A) which is important for several applicationsThere are several results of this type with overlapping ranges of generality (andvarious values of A) [36 51 58] and the exponent A is known to degrade if onehas too few moment assumptions on the underlying random matrixM This typeof result (with an unspecified A) is important for the circular law discussed inthe next section
14 Upper bound for the least singular value One can complement the lowertail estimate with an upper tail estimate
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
12 Least singular value circular law and Lindeberg exchange
Theorem 141 (Upper tail estimate for the least singular value) For any λ gt 0one has
(142) P(σn(M) gt λradicn) oλrarrinfin(1) + onrarrinfinλ(1)
We prove this using an argument of Rudelson and Vershynin [54] Supposethat σn(M) gt λ
radicn then
(143) ylowastMminus1 6radicnyλ
for all yNext let X1 Xn be the rows of M and let C1 Cn be the columns of
Mminus1 thus C1 Cn is a dual basis for X1 Xn From (143) we havensumi=1
|y middotCi|2 6 ny2λ2
We apply this with y equal to Xnminusπn(Xn) where πn is the orthogonal projectionto the space Vnminus1 spanned by X1 Xnminus1 On the one hand we have
y2 = dist(XnVnminus1)2
and on the other hand we have for any 1 6 i lt n that
y middotCi = minusπn(Xn) middotCi = minusXn middot πn(Ci)
and so
(144)nminus1sumi=1
|Xn middot πn(Ci)|2 6 ndist(XnVnminus1)2λ2
If (144) holds then |Xn middotπn(Ci)|2 = O(dist(XnVnminus1)2λ2) for at least half of the
i so the probability in (142) can be bounded by
1n
nminus1sumi=1
P(|Xn middot πn(Ci)|2 = O(dist(XnVnminus1)2λ2))
which by symmetry can be bounded by
P(|Xn middot πn(C1)|2 = O(dist(XnVnminus1)
2λ2))
Let ε gt 0 be a small quantity to be chosen later We now need an application ofTalagrandrsquos inequality
Exercise 145 Talagrandrsquos inequality [55] implies that if A is a convex subset ofRn At is the t-neighbourhood of A in the Euclidean metric for some t gt 0 andXn is a random Bernoulli vector then
P(Xn isin A)P(Xn 6isin At) 6 exp(minusct2)
for some absolute constant c gt 0 Use this to show that for any d-dimensionalsubspace V of Rn we have
P(|dist(XnV) minusradicnminus d| gt t) exp(minusct2)
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 13
for some absolute constant t (Hint one will need to combine Talagrandrsquos in-equality with a computation of E dist(XnV)2)
From Exercise 145 we know that dist(XnVnminus1) = Oε(1) with probability1 minusO(ε) so we obtain a bound of
P(Xn middot πn(C1) = Oε(1λ)) +O(ε)
Now a key point is that the vectors πn(C1) πn(Cnminus1) depend only onX1 Xnminus1 and not on Xn indeed they are the dual basis for X1 Xnminus1 inVnminus1 Thus after conditioning X1 Xnminus1 and thus πn(C1) to be fixed Xn isstill a Bernoulli random vector Applying a Berry-Esseacuteen inequality we obtaina bound of O(ε) for the conditional probability that Xn middot πn(C1) = Oε(1λ) forλ sufficiently large depending on ε unless πn(C1) is compressible (in the sensethat say it is within ε of an εn-sparse vector) But this latter possibility can becontrolled (with exponentially small probability) by the same type of argumentsas before we omit the details
We remark that stronger bounds for the upper tail have recently been obtainedin [48]
15 Asymptotic for the least singular value The distribution of singular valuesof a Gaussian random matrix can be computed explicitly In particular if M isa real Gaussian matrix (with all entries iid with distribution N(0 1)R) it wasshown in [23] that
radicnσn(M) converges in distribution to the distribution microE =
1+radicx
2radicxeminusx2minus
radicx dx as n rarr infin It turns out that this result can be extended to
other ensembles with the same mean and variance In particular we have thefollowing result from [60]
Theorem 151 If M is an iid Bernoulli matrix thenradicnσn(M) also converges in
distribution to microE as nrarrinfin (In fact there is a polynomial rate of convergence)
This should be compared with Theorems 132 141 which show thatradicnσn(M)
have a tight sequence of distributions in (0+infin) The arguments from [60] thusprovide an alternate proof of these two theorems The same result in fact holdsfor all iid ensembles obeying a finite moment condition
The arguments used to prove Theorem 151 do not establish the limit microE di-rectly but instead use the result of [23] as a black box focusing instead on estab-lishing the universality of the limiting distribution of
radicnσn(M) and in particular
that this limiting distribution is the same whether one has a Bernoulli ensembleor a Gaussian ensemble
The arguments are somewhat technical and we will not present them in fullhere but instead give a sketch of the key ideas
In previous sections we have already seen the close relationship between theleast singular value σn(M) and the distances dist(XiVi) between a row Xi ofM and the hyperplane Vi spanned by the other nminus 1 rows It is not hard to use
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
14 Least singular value circular law and Lindeberg exchange
the above machinery to show that as n rarr infin dist(XiVi) converges in distribu-tion to the absolute value |N(0 1)R| of a Gaussian regardless of the underlyingdistribution of the coefficients of M (ie it is asymptotically universal) The basicpoint is that one can write dist(XiVi) as |Xi middot ni| where ni is a unit normal ofVi (we will assume here that M is non-singular which by previous argumentsis true asymptotically almost surely) The previous machinery lets us show thatni is incompressible with high probability and the claim then follows from theBerry-Esseacuteen theorem
Unfortunately despite the presence of suggestive relationships such as Exercise131 the asymptotic universality of the distances dist(XiVi) does not directlyimply asymptotic universality of the least singular value However it turns outthat one can obtain a higher-dimensional version of the universality of the scalarquantities dist(XiVi) as follows For any small k (say 1 6 k 6 nc for somesmall c gt 0) and any distinct i1 ik isin 1 n a modification of the aboveargument shows that the covariance matrix
(152) (π(Xia) middot π(Xib))16ab6k
of the orthogonal projections π(Xi1) π(Xik) of the k rows Xi1 Xik to thecomplement Vperpi1ik
of the space Vi1ik spanned by the other n minus k rows ofM is also universal converging in distribution to the covariance6 matrix (Ga middotGb)16ab6k of k iid Gaussians Ga equiv N(0 1)R (note that the convergence ofdist(XiVi) to |N(0 1)R| is the k = 1 case of this claim) The key point is that onecan show that the complement Vperpi1ik
is usually ldquoincompressiblerdquo in a certaintechnical sense which implies that the projections π(Xia) behave like iid Gaus-sians on that projection thanks to a multidimensional Berry-Esseacuteen theorem
On the other hand the covariance matrix (152) is closely related to the inversematrix Mminus1
Exercise 153 Show that (152) is also equal to AlowastA where A is the ntimes k matrixformed from the i1 ik columns of Mminus1
In particular this shows that the singular values of k randomly selected columnsof Mminus1 have a universal distribution
Recall our goal is to show thatradicnσn(M) has an asymptotically universal dis-
tribution which is equivalent to asking that 1radicnMminus1op has an asymptotically
universal distribution The goal is then to extract the operator norm of Mminus1 fromlooking at a random ntimes k minor B of this matrix This comes from the followingapplication of the second moment method
Exercise 154 Let A be an ntimes n matrix with columns R1 Rn and let B bethe ntimes k matrix formed by taking k of the columns R1 Rn at random Show
6These covariance matrix distributions are also known as Wishart distributions
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 15
that
EAlowastAminusn
kBlowastB2
F 6n
k
nsumk=1
Rk4
where F is the Frobenius norm AF = tr(AlowastA)12
Recall from Exercise 131 that Rk = 1dist(XkVk) so we expect each Rkto have magnitude aboutO(1) As such we expect σ1((M
minus1)lowast(Mminus1)) = σn(M)minus2
to differ byO(n2k) from nkσ1(B
lowastB) = nkσ1(B)
2 In principle this gives us asymp-totic universality on
radicnσn(M) from the already established universality of B
There is one technical obstacle remaining however while we know that eachdist(XkVk) is distributed like a Gaussian so that each individual Rk is going tobe of size O(1) with reasonably good probability in order for the above exerciseto be useful one needs to bound all of the Rk simultaneously with high probabilityA naive application of the union bound leads to terrible results here Fortunatelythere is a strong correlation between the Rk they tend to be large together orsmall together or equivalently that the distances dist(XkVk) tend to be smalltogether or large together Here is one indication of this
Lemma 155 For any 1 6 k lt i 6 n one has
dist(XiVi) gtπi(Xi)
1 +sumkj=1
πi(Xj)πi(Xi)dist(XjVj)
where πi is the orthogonal projection onto the space spanned by X1 XkXi
Proof We may relabel so that i = k+ 1 then projecting everything by πi we mayassume that n = k+ 1 Our goal is now to show that
dist(XnVnminus1) gtXn
1 +sumnminus1j=1
XjXndist(XjVj)
Recall that R1 Rn is a dual basis to X1 Xn This implies in particular that
x =
nsumj=1
(x middotXj)Rj
for any vector x applying this to Xn we obtain
Xn = Xn2Rn +
nminus1sumj=1
(Xj middotXn)Rj
and hence by the triangle inequality
Xn2Rn 6 Xn+nminus1sumj=1
XjXnRj
Using the fact that Rj = 1dist(XjRj) the claim follows
In practice once k gets moderately large (eg k = nc for some small c gt 0)one can control the expressions πi(Xj) appearing here by Talagrandrsquos inequality
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
16 Least singular value circular law and Lindeberg exchange
(Exercise 145) and so this inequality tells us that once dist(XjVj) is boundedaway from zero for j = 1 k it is bounded away from zero for all other k alsoThis turns out to be enough to get enough uniform control on the Rj to makeExercise 154 useful and ultimately to complete the proof of Theorem 151
2 The circular law
In this section we leave the realm of self-adjoint matrix ensembles such asWigner random matrices and consider instead the simplest examples of non-self-adjoint ensembles namely the iid matrix ensembles
The basic result in this area is
Theorem 201 (Circular law) Let Mn be an n times n iid matrix whose entries ξij1 6 i j 6 n are iid with a fixed (complex) distribution ξij equiv ξ of mean zero andvariance one Then the spectral measure micro 1radic
nMn
converges (in the vague topology) both
in probability and almost surely to the circular law microcirc = 1π1|x|2+|y|261 dxdy where
xy are the real and imaginary coordinates of the complex plane
This theorem has a long history it is analogous to the semicircular law butthe non-Hermitian nature of the matrices makes the spectrum so unstable thatkey techniques that are used in the semicircular case such as truncation andthe moment method no longer work significant new ideas are required Inthe case of random Gaussian matrices (and using the expected spectral measurerather than the actual spectral measure) this result was established by Ginibre[33] (in the complex case) and by Edelman [22] (in the real case) using the explicitformulae for the joint distribution of eigenvalues available in these cases In 1984Girko [34] laid out a general strategy for establishing the result for non-gaussianmatrices which formed the base of all future work on the subject however akey ingredient in the argument namely a bound on the least singular value ofshifts 1radic
nMn minus zI was not fully justified at the time A rigorous proof of the
circular law was then established by Bai [9] assuming additional moment andboundedness conditions on the individual entries These additional conditionswere then slowly removed in a sequence of papers [35 36 51 58] with the lastmoment condition being removed in [63] There have since been several furtherworks in which the circular law was extended to other ensembles [1ndash3510ndash132021 47 50 67] including several models in which the entries are no longer jointlyindependent see also the surveys [14 58] The method also applies to variousensembles with a different limiting law than the circular law see eg [49] [37]More recently local circular laws have also been established for various models[5 16 17 65 68]
21 Spectral instability One of the basic difficulties present in the non-Hermitiancase is spectral instability small perturbations in a large matrix can lead to large
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 17
fluctuations in the spectrum In order for any sort of analytic technique to beeffective this type of instability must somehow be precluded
The canonical example of spectral instability comes from perturbing the rightshift matrix
U0 =
0 1 0 0
0 0 1 0
0 0 0 0
to the matrix
Uε =
0 1 0 0
0 0 1 0
ε 0 0 0
for some ε gt 0
The matrix U0 is nilpotent Un0 = 0 Its characteristic polynomial is (minusλ)nand it thus has n repeated eigenvalues at the origin In contrast Uε obeys theequation Unε = εI its characteristic polynomial is (minusλ)n minus ε(minus1)n and it thushas n eigenvalues at the nth roots ε1ne2πijn j = 0 nminus 1 of ε Thus evenfor exponentially small values of ε say ε = 2minusn the eigenvalues for Uε can bequite far from the eigenvalues of U0 and can wander all over the unit disk Thisis in sharp contrast with the Hermitian case where eigenvalue inequalities suchas the Weyl inequalities or Wielandt-Hoffman inequalities ensure stability of thespectrum
One can explain the problem in terms of pseudospectrum7 The only spectrumof U is at the origin so the resolvents (U0 minus zI)
minus1 of U0 are finite for all non-zeroz However while these resolvents are finite they can be extremely large Indeedfrom the nilpotent nature of U0 we have the Neumann series
(U0 minus zI)minus1 = minus
1zminusU0
z2 minus minusUnminus1
0zn
so for |z| lt 1 we see that the resolvent has size roughly |z|minusn which is expo-nentially large in the interior of the unit disk This exponentially large size ofresolvent is consistent with the exponential instability of the spectrum
Exercise 211 Let M be a square matrix and let z be a complex number Showthat (Mminus zI)minus1op gt R if and only if there exists a perturbation M+ E of Mwith Eop 6 1R such that M+ E has z as an eigenvalue
This already hints strongly that if one wants to rigorously prove control on thespectrum of M near z one needs some sort of upper bound on (Mminus zI)minus1op
7The pseudospectrum of an operator T is the set of complex numbers z for which the operator norm(T minus zI)minus1op is either infinite or larger than a fixed threshold 1ε See [66] for further discussion
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
18 Least singular value circular law and Lindeberg exchange
or equivalently one needs some sort of lower bound on the least singular valueσn(Mminus zI) of Mminus zI
Without such a bound though the instability precludes the direct use of thetruncation method which was so useful in the Hermitian case In particular thereis no obvious way to reduce the proof of the circular law to the case of boundedcoefficients in contrast to the semicircular law for Wigner matrices where thisreduction follows easily from the Wielandt-Hoffman inequality Instead we mustcontinue working with unbounded random variables throughout the argument(unless of course one makes an additional decay hypothesis such as assumingcertain moments are finite this helps explain the presence of such moment con-ditions in many papers on the circular law)
22 Incompleteness of the moment method In the Hermitian case the mo-ments
1n
tr(1radicnM)k =
intR
xk dmicro 1radicnMn
(x)
of a matrix can be used (in principle) to understand the distribution micro 1radicnMn
completely (at least when the measure micro 1radicnMn
has sufficient decay at infinity)This is ultimately because the space of real polynomials P(x) is dense in variousfunction spaces (the Weierstrass approximation theorem)
In the non-Hermitian case the spectral measure micro 1radicnMn
is now supported onthe complex plane rather than the real line One still has the formula
1n
tr(1radicnM)k =
intC
zk dmicro 1radicnMn
(z)
but it is much less useful now because the space of complex polynomials P(z) nolonger has any good density properties8 In particular the moments no longeruniquely determine the spectral measure
This can be illustrated with the shift examples given above It is easy to seethat U and Uε have vanishing moments up to (nminus 1)th order ie
1n
tr(
1radicnU
)k=
1n
tr(
1radicnUε
)k= 0
for k = 1 nminus 1 Thus we haveintC
zk dmicro 1radicnU
(z) =
intC
zk dmicro 1radicnUε
(z) = 0
for k = 1 nminus 1 Despite this enormous number of matching moments thespectral measures micro 1radic
nU
and micro 1radicnUε
are dramatically different the former is aDirac mass at the origin while the latter can be arbitrarily close to the unit circleIndeed even if we set all moments equal to zeroint
C
zk dmicro = 0
8For instance the uniform closure of the space of polynomials on the unit disk is not the space ofcontinuous functions but rather the space of holomorphic functions that are continuous on the closedunit disk
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 19
for k = 1 2 then there are an uncountable number of possible (continuous)probability measures that could still be the (asymptotic) spectral measure micro forinstance any measure which is rotationally symmetric around the origin wouldobey these conditions
If one could somehow control the mixed momentsintC
zkzl dmicro 1radicnMn
(z) =1n
nsumj=1
(1radicnλj(Mn)
)k( 1radicnλj(Mn)
)lof the spectral measure then this problem would be resolved and one coulduse the moment method to reconstruct the spectral measure accurately Howeverthere does not appear to be any easy way to compute this quantity the obviousguess of 1
n tr( 1radicnMn)
k( 1radicnMlowastn)
l works when the matrix Mn is normal as Mnand Mlowastn then share the same basis of eigenvectors but generically one does notexpect these matrices to be normal
Remark 221 The failure of the moment method to control the spectral measureis consistent with the instability of spectral measure with respect to perturbationsbecause moments are stable with respect to perturbations
Exercise 222 Let k gt 1 be an integer and let Mn be an iid matrix whose entrieshave a fixed distribution ξ with mean zero variance 1 and with kth momentfinite Show that 1
n tr( 1radicnMn)
k converges to zero as n rarr infin in expectation inprobability and in the almost sure sense Thus we see that
intC zk dmicro 1radic
nMn
(z)
converges to zero in these three senses also This is of course consistent withthe circular law but does not come close to establishing that law for the reasonsgiven above
Remark 223 The failure of the moment method also shows that methods offree probability (see eg [46]) do not work directly For instance observe thatfor fixed ε U0 and Uε (in the noncommutative probability space (Matn(C) 1
n tr))both converge in the sense of lowast-moments as n rarr infin to that of the right shiftoperator on `2(Z) (with the trace τ(T) = 〈e0 Te0〉 with e0 being the Kroneckerdelta at 0) but the spectral measures of U0 and Uε are different Thus the spectralmeasure cannot be read off directly from the free probability limit
23 The logarithmic potential With the moment method out of considerationattention naturally turns to the Stieltjes transform
sn(z) =1n
tr(
1radicnMn minus zI
)minus1=
intC
dmicro 1radicnMn
(w)
wminus z
This is a rational function on the complex plane Its relationship with the spectralmeasure is as follows
Exercise 231 Show thatmicro 1radic
nMn
=1πpartzsn(z)
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
20 Least singular value circular law and Lindeberg exchange
in the sense of distributions where
partz =12(part
partx+ i
part
party)
is the Cauchy-Riemann operator and the Stieltjes transform is interpreted distri-butionally in a principal value sense
One can control the Stieltjes transform quite effectively away from the originIndeed for iid matrices with subgaussian entries one can show that the spectralradius of 1radic
nMn is 1+o(1) almost surely this combined with (222) and Laurent
expansion tells us that sn(z) almost surely converges to minus1z locally uniformlyin the region z |z| gt 1 and that the spectral measure micro 1radic
nMn
converges almostsurely to zero in this region (which can of course also be deduced directly fromthe spectral radius bound) This is of course consistent with the circular law butis not sufficient to prove it (for instance the above information is also consistentwith the scenario in which the spectral measure collapses towards the origin)One also needs to control the Stieltjes transform inside the disk z |z| 6 1 inorder to fully control the spectral measure
For this many existing methods for controlling the Stieltjes transform are notparticularly effective in this non-Hermitian setting (mainly because of the spectralinstability and also because of the lack of analyticity in the interior of the spec-trum) Instead one proceeds by relating the Stieltjes transform to the logarithmicpotential
fn(z) =
intC
log |wminus z|dmicro 1radicnMn
(w)
It is easy to see that sn(z) is essentially the (distributional) gradient of fn(z)
sn(z) =
(minuspart
partx+ i
part
party
)fn(z)
and thus gn is related to the spectral measure by the distributional formula9
(232) micro 1radicnMn
=1
2π∆fn
where ∆ = part2
partx2 + part2
party2 is the LaplacianThe following basic result relates the logarithmic potential to probabilistic no-
tions of convergence
Theorem 233 (Logarithmic potential continuity theorem) LetMn be a sequence ofrandom matrices and suppose that for almost every complex number z fn(z) convergesalmost surely (resp in probability) to
f(z) =
intC
log |zminusw|dmicro(w)
for some probability measure micro Then micro 1radicnMn
converges almost surely (resp in probabil-ity) to micro in the vague topology
Proof We prove the almost sure version of this theorem and leave the conver-gence in probability version as an exercise
9This formula just reflects the fact that 12π log |z| is the Newtonian potential in two dimensions
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 21
On any bounded set K in the complex plane the functions log | middotminusw| lie in L2(K)
uniformly in w From Minkowskirsquos integral inequality we conclude that the fnand f are uniformly bounded in L2(K) On the other hand almost surely the fnconverge pointwise to f From the dominated convergence theorem this impliesthat min(|fn minus f|M) converges in L1(K) to zero for any M using the uniformbound in L2(K) to compare min(|fn minus f|M) with |fn minus f| and then sending Mrarrinfin we conclude that fn converges to f in L1(K) In particular fn converges to f inthe sense of distributions taking distributional Laplacians using (232) we obtainthe claim
Exercise 234 Establish the convergence in probability version of Theorem 233
Thus the task of establishing the circular law then reduces to showing foralmost every z that the logarithmic potential fn(z) converges (in probability oralmost surely) to the right limit f(z)
Observe that the logarithmic potential
fn(z) =1n
nsumj=1
log∣∣∣∣λj(Mn)radic
nminus z
∣∣∣∣can be rewritten as a log-determinant
fn(z) =1n
log∣∣∣∣det(
1radicnMn minus zI)
∣∣∣∣ To compute this determinant we recall that the determinant of a matrix A is notonly the product of its eigenvalues but also has a magnitude equal to the productof its singular values
|detA| =nprodj=1
σj(A) =
nprodj=1
λj(AlowastA)12
and thusfn(z) =
12
intinfin0
log x dνnz(x)
where dνnz is the spectral measure of the matrix(
1radicnMn minus zI
)lowast (1radicnMn minus zI
)
The advantage of working with this spectral measure as opposed to the orig-inal spectral measure micro 1radic
nMn
is that the matrix ( 1radicnMn minus zI)lowast( 1radic
nMn minus zI) is
self-adjoint and so methods such as the moment method or free probability cannow be safely applied to compute the limiting spectral distribution Indeed Girko[34] established that for almost every z νnz converged both in probability andalmost surely to an explicit (though slightly complicated) limiting measure νz inthe vague topology Formally this implied that fn(z) would converge pointwise(almost surely and in probability) to
12
intinfin0
log x dνz(x)
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
22 Least singular value circular law and Lindeberg exchange
A lengthy but straightforward computation then showed that this expression wasindeed the logarithmic potential f(z) of the circular measure microcirc so that thecircular law would then follow from the logarithmic potential continuity theorem
Unfortunately the vague convergence of νnz to νz only allows one to deducethe convergence of
intinfin0 F(x) dνnz to
intinfin0 F(x) dνz for F continuous and compactly
supported The logarithm function log x has singularities at zero and at infinityand so the convergenceintinfin
0log x dνnz(x)rarr
intinfin0
log x dνz(x)
can fail if the spectral measure νnz sends too much of its mass to zero or toinfinity
The latter scenario can be easily excluded either by using operator normbounds on Mn (when one has enough moment conditions) or even just theFrobenius norm bounds (which require no moment conditions beyond the unitvariance) The real difficulty is with preventing mass from going to the origin
The approach of Bai [9] proceeded in two steps Firstly he established a poly-nomial lower bound
σn
(1radicnMn minus zI
)gt nminusC
asymptotically almost surely for the least singular value of 1radicnMn minus zI This has
the effect of capping off the log x integrand to be of size O(logn) Next by usingStieltjes transform methods the convergence of νnz to νz in an appropriate met-ric (eg the Levi distance metric) was shown to be polynomially fast so that thedistance decayed like O(nminusc) for some c gt 0 The O(nminusc) gain can safely absorbthe O(logn) loss and this leads to a proof of the circular law assuming enoughboundedness and continuity hypotheses to ensure the least singular value boundand the convergence rate This basic paradigm was also followed by later works[365158] with the main new ingredient being the advances in the understandingof the least singular value (Section 1)
Unfortunately to get the polynomial convergence rate one needs some mo-ment conditions beyond the zero mean and unit variance rate (eg finite 2 + ηth
moment for some η gt 0) In [63] the additional tool of the Talagrand concentra-tion inequality (see Exercise 145) was used to eliminate the need for the polyno-mial convergence Intuitively the point is that only a small fraction of the singularvalues of 1radic
nMn minus zI are going to be as small as nminusc most will be much larger
than this and so the O(logn) bound is only going to be needed for a small frac-tion of the measure To make this rigorous it turns out to be convenient to workwith a slightly different formula for the determinant magnitude |det(A)| of asquare matrix than the product of the eigenvalues namely the base-times-heightformula
|det(A)| =nprodj=1
dist(XjVj)
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 23
where Xj is the jth row and Vj is the span of X1 Xjminus1
Exercise 235 Establish the inequalitynprod
j=n+1minusm
σj(A) 6mprodj=1
dist(XjVj) 6mprodj=1
σj(A)
for any 1 6 m 6 n (Hint the middle product is the product of the singular valuesof the first m rows of A and so one should try to use the Cauchy interlacinginequality for singular values) Thus we see that dist(XjVj) is a variant of σj(A)
The least singular value bounds above (and in Remark 135) translated inthis language (with A = 1radic
nMn minus zI) tell us that dist(XjVj) gt nminusC with high
probability this lets us ignore the most dangerous values of j namely those j thatare equal to nminusO(n099) (say) For low values of j say j 6 (1minus δ)n for some smallδ one can use the moment method to get a good lower bound for the distancesand the singular values to the extent that the logarithmic singularity of log x nolonger causes difficulty in this regime the limit of this contribution can then beseen by moment method or Stieltjes transform techniques to be universal in thesense that it does not depend on the precise distribution of the components ofMnIn the medium regime (1minus δ)n lt j lt nminusn099 one can use Talagrandrsquos inequality(Exercise 145) to show that dist(XjVj) has magnitude about
radicnminus j giving rise
to a net contribution to fn(z) of the form 1n
sum(1minusδ)nltjltnminusn099 O(log
radicnminus j)
which is small Putting all this together one can show that fn(z) converges toa universal limit as n rarr infin (independent of the component distributions) see[63] for details As a consequence once the circular law is established for oneclass of iid matrices such as the complex Gaussian random matrix ensemble itautomatically holds for all other ensembles also
Exercise 236 (i) If micro is the circular law measure 1π1|z|61dz show that the
logarithmic potential f(z) is equal to log |z| for |z| gt 1 and |z|2minus12 for |z| 6 1
(ii) If Mn is a random Bernoulli matrix and z is a fixed complex numbershow that the quantity det( 1radic
nMn minus zI) has mean (minusz)n and variancesumnminus1
j=0n|z|2j
nnminusjj Use this to obtain a heuristic prediction as to the size of thequantity fn(z) discussed above and argue why this is consistent with thecircular law and the calculation in (i)
3 The Lindeberg exchange method
The central limit theorem asserts that if X1X2 are a sequence of iid realrandom variables of mean zero and variance one then the normalised means
X1 + middot middot middot+Xnradicn
converge in distribution to the normal distribution N(0 1) as nrarrinfin thus
limnrarrinfin P(
X1 + middot middot middot+Xnradicn
6 t) = P(G 6 t)
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
24 Least singular value circular law and Lindeberg exchange
for any t isin R where G is a normally distributed random variable of mean zeroand variance one Equivalently if F Rrarr R is any smooth compactly supportedfunction then one has
(301) EF
(X1 + middot middot middot+Xnradic
n
)= EF(G) + o(1)
as nrarrinfin where o(1) denotes a quantity that goes to zero as nrarrinfin
Exercise 302 Why are these two claims equivalent
One consequence of the central limit theorem is that the statistics EF(X1+middotmiddotmiddot+Xnradicn
)
are asymptotically universal in the sense that they do not depend on the precisedistribution of the individual random variables In particular if Y1 Yn areanother sequence of iid random variables with a different distribution than theX1 Xn then we have the asymptotic universality property
(303) EF
(X1 + middot middot middot+Xnradic
n
)= EF
(Y1 + middot middot middot+ Ynradic
n
)+ o(1)
as nrarrinfinThe central limit theorem is traditionally proven by Fourier analytic methods
and then the universality (303) obtained as a corollary However one could alsoargue in the reverse direction establishing the central limit theorem (301) as aconsequence Indeed in 1922 Lindeberg [44] established the central limit theoremby establishing the following three claims
(1) If Y1 Y2 were an iid sequence of gaussian random variables of meanzero and variance one then
EF
(Y1 + middot middot middot+ Ynradic
n
)= EF(G) + o(1)
(2) If X1X2 were an iid sequence of random variables mean zero andvariance one and finite third moment E|Xi|
3 ltinfin then
(304) EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)= o(1)
(3) If the central limit theorem (301) was true for iid random variables ofmean zero variance one and finite third moment it would also be truewithout the finite third moment condition
Clearly the central limit theorem is immediate from the above three claimsThe first claim is an easy computation since the sum of any finite number of
independent gaussian random variables is still gaussian the variable Y1+middotmiddotmiddot+Ynradicn
is also gaussian Since this variable has mean zero and variance one the claimfollows (indeed we donrsquot even have the o(1) error in this case)
The third claim is also an easy consequence of a standard truncation argumentSuppose that X1X2 were iid copies of a random variable X of mean zero andvariance one but possibly infinite third moment For any cutoff parameter Mthe truncation X1|X|6M will have finite third moment it might not have meanzero and variance one but from the dominated convergence theorem we see that
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 25
the mean of this random variable goes to zero and the variance goes to one asM rarr infin Similarly the tail X1|X|gtM has variance going to zero as M rarr infinBy adjusting the former random variable slightly to normalise the mean andvariance we thus see that for any ε gt 0 one can obtain a decomposition
X = X primeε +Xprimeprimeε
where X primeε is a random variable of mean zero variance one and finite third mo-ment while X primeprimeε is a random variable of variance at most ε (and necessarily ofmean zero by linearity of expectation) The random variables X primeε and X primeprimeε may becoupled to each other but this will not concern us We can therefore split eachXi as Xi = X primeiε +X
primeprimeiε where X prime1ε X primenε are iid copies of X primeε and X primeprime1ε X primeprimenε
are iid copies of X primeprimeε We then have
X1 + middot middot middot+Xnradicn
=X prime1ε + middot middot middot+X
primenεradic
n+X primeprime1ε + middot middot middot+X
primeprimenεradic
n
If the central limit theorem holds in the case of finite third moment then therandom variable
X prime1ε+middotmiddotmiddot+Xprimenεradic
nconverges in distribution to G Meanwhile the
random variableX primeprime1ε+middotmiddotmiddot+X
primeprimenεradic
nhas mean zero and variance at most ε From this
we see that (301) holds up to an error of O(ε) sending ε to zero we obtain theclaim
The heart of the Lindeberg argument is in the second claim Without lossof generality we may take the tuple (X1 Xn) to be independent of the tuple(Y1 Yn) The idea is not to swap the X1 Xn with the Y1 Yn all at oncebut instead to swap them one at a time Indeed one can write the difference
EF
(X1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Ynradic
n
)as a telescoping series formed by the sum of the n terms
EF
(Y1 + middot middot middot+ Yiminus1 +Xi +Xi+1 + middot middot middot+Xnradic
n
)minus EF
(Y1 + middot middot middot+ Yiminus1 + Yi + middot middot middot+ Ynradic
n
)(305)
for i = 1 n each of which represents a single swap from Xi to Yi Thus toprove (304) it will suffice to show that each of the terms (305) is of size o(1n)(uniformly in i)
We can write (305) as
EF
(Zi +
1radicnXi
)minus EF
(Zi +
1radicnYi
)where Zi is the random variable
Zi =Y1 + middot middot middot+ Yiminus1 +Xi+1 + middot middot middot+Xnradic
n
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
26 Least singular value circular law and Lindeberg exchange
The key point here is that Zi is independent of both Xi and Yi To exploit thiswe use the smoothness of F to perform a Taylor expansion
F(Zi +1radicnXi) = F(Zi) +
1radicnXiFprime(Zi) +
12nX2iFprimeprime(Zi) +O
(1n32 |Xi|
3)
and hence on taking expectations (and using the finite third moment hypothesisand independence of Xi from Zi)
EF(Zi +1radicnXi) = EF(Zi) +
1radicn
EXiEFprime(Zi) +
12n
EX2iEF
primeprime(Zi) +O
(1n32
)
Similarly
EF(Zi +1radicnYi) = EF(Zi) +
1radicn
EYiEFprime(Zi) +
12n
EY2iEF
primeprime(Zi) +O
(1n32
)
Now observe that as Xi and Yi both have mean zero and variance one theirfirst two moments match EXi = EYi and EX2
i = EY2i As such the first three
terms of the above two right-hand sides match and thus (305) is bounded byO(nminus32) which is o(1n) as required Note how important it was that we hadtwo matching moments if we only had one matching moment then the boundfor (305) would only be O(1n) which is not sufficient (And of course thecentral limit theorem would fail as stated if we did not correctly normalise thevariance) One can therefore think of the central limit theorem as a two momenttheorem asserting that the asymptotic behaviour of the statistic EF(X1+middotmiddotmiddot+Xnradic
n) for
iid random variables X1 Xn depends only on the first two moments of the Xi
Exercise 306 (Lindeberg central limit theorem) Let m1m2 be a sequence ofnatural numbers For each n let Xn1 Xnmn be a collection of independentreal random variables of mean zero and total variance one thus
EXni = 0 forall1 6 i 6 mn
andmnsumi=1
EX2ni = 1
Suppose also that for every fixed δ gt 0 (not depending on n) one hasmnsumi=1
EX2ni1|Xni|gtδ = o(1)
as n rarr infin Show that the random variablessummni=1 Xni converge in distribution
as nrarrinfin to a normal variable of mean zero and variance one
Exercise 307 (Martingale central limit theorem) Let F0 sub F1 sub F2 sub be anincreasing collection of σ-algebras in the ambient sample space For each n let Xnbe a real random variable that is measurable with respect to Fn with conditionalmean and variance one with respect to F0 thus
E(Xn|Fnminus1) = 0
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 27
andE(X2
n|Fnminus1) = 1
almost surely Assume also the bounded third moment hypothesis
E(|Xn|3|Fnminus1) 6 C
almost surely for all n and some finite C independent of n Show that the randomvariables X1+middotmiddotmiddot+Xnradic
nconverge in distribution to a normal variable of mean zero
and variance one (One can relax the hypotheses on this martingale central limittheorem substantially but we will not explore this here)
Exercise 308 (Weak Berry-Esseacuteen theorem) Let X1 Xn be an iid sequence ofreal random variables of mean zero and variance 1 and bounded third momentE|Xi|
3 = O(1) Let G be a gaussian random variable of mean zero and varianceone Using the Lindeberg exchange method show that
P(X1 + middot middot middot+Xnradic
n6 t) = P(G 6 t) +O(nminus18)
for any t isin R (The full Berry-Esseacuteen theorem improves the error term toO(nminus12) but this is difficult to establish purely from the Lindeberg exchangemethod one needs alternate methods such as Fourier-based methods or Steinrsquosmethod to recover this improved gain) What happens if one assumes morematching moments between the Xi and G such as matching third moment EX3
i =
EG3 or matching fourth moment EX4i = EG4
One can think of the Lindeberg method as having the schematic form of thetelescoping identity
Xn minus Yn =
nsumi=1
Yiminus1(Xminus Y)Xnminusi
which is valid in any (possibly non-commutative) ring It breaks the symmetry ofthe indices 1 n of the random variables X1 Xn and Y1 Yn by perform-ing the swaps in a specified order A more symmetric variant of the Lindebergmethod was introduced recently by Knowles and Yin [41] and has the schematicform of the fundamental theorem of calculus identity
Xn minus Yn =
int1
0
nsumi=1
((1 minus θ)X+ θY)iminus1(Xminus Y)((1 minus θ)X+ θY)nminusi dθ
which is valid in any (possibly non-commutative) real algebra and can be es-tablished by computing the θ derivative of ((1 minus θ)X+ θY)n We illustrate thismethod by giving a slightly different proof of (304) We may again assumethat the X1 Xn and Y1 Yn are independent of each other We introduceauxiliary random variables t1 tn drawn uniformly at random from [0 1] in-dependently of each other and of the X1 Xn and Y1 Yn For any 0 6 θ 6 1and 1 6 i 6 n let X(θ)
i denote the random variable
X(θ)i = 1ti6θXi + 1tigtθYi
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
28 Least singular value circular law and Lindeberg exchange
thus for instance X(0)i = Yi and X
(1)i = Xi almost surely One then has the
following key derivative computation
Exercise 309 With the notation and assumptions as above show that(3010)
d
dθEF
(X(θ)1 + middot middot middot+X(θ)
nradicn
)=
nsumi=1
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)where
Z(θ)i =
X(θ)1 + middot middot middot+X(θ)
iminus1 +X(θ)i+1 + middot middot middot+X
(θ)nradic
n
In particular the derivative on the left-hand side of (3010) exists and dependscontinuously on θ
From the above exercise and the fundamental theorem of calculus we canwrite the left-hand side of (304) asint1
0
nsumi=1
EF(Z(θ)i +
1radicnXi) minus EF(Z
(θ)i +
1radicnYi) dθ
Repeating the Taylor expansion argument used in the original Lindeberg methodwe see that
EF
(Z(θ)i +
1radicnXi
)minus EF
(Z(θ)i +
1radicnYi
)= O(nminus32)
thus giving an alternate proof of (304)
Remark 3011 In this particular instance the Knowles-Yin version of the Linde-berg method did not offer any significant advantages over the original Lindebergmethod However the more symmetric form of the former is useful in some ran-dom matrix theory applications due to the additional cancellations this inducesSee [41] for an example of this
The Lindeberg exchange method was first employed to study statistics of ran-dom matrices in [19] the method was applied to local statistics first in [61] andthen (with a simplified approach) in [42] (See also [6ndash8] for another applicationof the Lindeberg method to random matrix theory) To illustrate this methodwe will (for simplicity of notation) restrict our attention to real Wigner matri-ces Mn = 1radic
n(ξij)16ij6n where ξij are real random variables of mean zero
and variance either one (if i 6= j) or two (if i = j) with the symmetry condi-tion ξij = ξji for all 1 6 i j 6 n and with the upper-triangular entries ξij1 6 i 6 j 6 n jointly independent For technical reasons we will also assume thatthe ξij are uniformly subgaussian thus there exist constants C c gt 0 such that
P(|ξij| gt t) 6 C exp(minusct2)
for all t gt 0 In particular the kth moments of the ξij will be bounded forany fixed natural number k These hypotheses can be relaxed somewhat butwe will not aim for the most general results here One could easily consider
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 29
Hermitian Wigner matrices instead of real Wigner matrices after some minormodifications to the notation and discussion below (eg replacing the GaussianOrthogonal Ensemble (GOE) with the Gaussian Unitary Ensemble (GUE)) The
1radicn
term appearing in the definition of Mn is a standard normalising factor toensure that the spectrum of Mn (usually) stays bounded (indeed it will almostsurely obey the famous semicircle law restricting the spectrum almost completelyto the interval [minus2 2])
The most well known example of a real Wigner matrix ensemble is the Gauss-ian Orthogonal Ensemble (GOE) in which the ξij are all real gaussian randomvariables (with the mean and variance prescribed as above) This is a highly sym-metric ensemble being invariant with respect to conjugation by any element ofthe orthogonal group O(n) and as such many of the sought-after spectral statis-tics of GOE matrices can be computed explicitly (or at least asymptotically) bydirect computation of certain multidimensional integrals (which in the case ofGOE eventually reduces to computing the integrals of certain Pfaffian kernels)We will not discuss these explicit computations further here but view them asthe analogue to the simple gaussian computation used to establish Claim 1 ofLindebergrsquos proof of the central limit theorem Instead we will focus more onthe analogue of Claim 2 - using an exchange method to compare statistics forGOE to statistics for other real Wigner matrices
We can write a Wigner matrix 1radicn(ξij)16ij6n as a sum
Mn =1radicn
sum(ij)isin∆
ξijEij
where ∆ is the upper triangle
∆ = (i j) 1 6 i 6 j 6 n
and the real symmetric matrix Eij is defined to equal Eij = eTi ej + eTj ei for i lt j
and Eii = eTi ei in the diagonal case i = j where e1 en are the standard basisof Rn (viewed as column vectors) Meanwhile a GOE matrix Gn can be writtenas
Gn =sum
(ij)isin∆ηijEij
where ηij (i j) isin ∆ are jointly independent gaussian real random variables ofmean zero and variance either one (for i 6= j) or two (for i = j) For any naturalnumber m we say that Mn and Gn have m matching moments if we have
Eξkij = Eηkij
for all k = 1 m Thus for instance we always have two matching momentsby our definition of a Wigner matrix
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
30 Least singular value circular law and Lindeberg exchange
If S(Mn) is any (deterministic) statistic of a Wigner matrix Mn we say that wehave a m moment theorem for the statistic S(Mn) if one has
(3012) S(Mn) minus S(Gn) = o(1)
whenever Mn and Gn have m matching moments In most applications m willbe very small either equal to 2 3 or 4
One can prove moment theorems using the Lindeberg exchange method Forinstance consider statistics of the form S(Mn) = EF(Mn) where F V rarr C is abounded measurable function on the space V of real symmetric matrices Thenwe can write the left-hand side of (3012) as the sum of |∆| = n(n+1)
2 terms of theform
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)where for each (i j) isin ∆ Mnij is the real symmetric matrix
Mnij =sum
(i primej prime)lt(ij)
1radicnηi primej primeEi primej prime +
sum(i primej prime)gt(ij)
1radicnξi primej primeEi primej prime
where one imposes some arbitrary ordering lt on ∆ (eg the lexicographicalordering) Strictly speaking the Mnij are not Wigner matrices because theirij entry vanishes and thus has zero variance but their behaviour turns out tobe almost identical to that of a Wigner matrix (being a rank one or rank twoperturbation of such a matrix) Thus to prove anmmoment theorem for EF(Mn)it thus suffices by the triangle inequality to establish a bound of the form
(3013) EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= o(nminus2)
for all (i j) isin ∆ whenever ξij and ηij have m matching moments (For thediagonal entries i = j a bound of o(nminus1) will in fact suffice as the number ofsuch entries is n rather than O(n2))
As with the Lindeberg proof of the central limit theorem one can establish(3013) for various statistics F by performing a Taylor expansion with remainderTo illustrate this we consider the expectation Es(Mn z) of the Stieltjes transform
s(Mn z) =1n
trR(Mn z)
for some complex number z = E+ iη with η gt 0 where R(Mn z) = (Mn minus z)minus1
denotes the resolvent (also known as the Greenrsquos function) and we identify z withthe matrix zIn The application of the Lindeberg exchange method to quantitiesrelating to Greenrsquos functions is also referred to as the Greenrsquos function comparisonmethod The Stieltjes transform is closely tied to the behavior of the eigenvaluesλ1 λn of Mn (counting multiplicity) thanks to the spectral identity
(3014) s(Mn z) =1n
nsumj=1
1λj minus z
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 31
On the other hand the resolvents R(Mn z) are particularly amenable to the Lin-deberg exchange method thanks to the resolvent identity
R(Mn +An z) = R(Mn z) minus R(Mn z)AnR(Mn +An z)
whenever MnAn z are such that both sides are well defined (eg if MnAn arereal symmetric and z has positive imaginary part) this identity is easily verifiedby multiplying both sides by Mn minus z on the left and Mn +An minus z on the rightOne can iterate this to obtain the Neumann series
(3015) R(Mn +An z) =infinsumk=0
(minusR(Mn z)An)kR(Mn z)
assuming that the matrix R(Mn z)An has spectral radius less than one Takingnormalised traces and using the cyclic property of trace we conclude in particularthat(3016)
s(Mn +An z) = s(Mn z) +infinsumk=1
(minus1)k
ntr(R(Mn z)2An(R(Mn z)An)kminus1
)
To use these identities we invoke the following useful facts about Wigner ma-trices
Theorem 3017 LetMn be a real Wigner matrix let λ1 λn be the eigenvalues andlet u1 un be an orthonormal basis of eigenvectors Let A gt 0 be any constant Thenwith probability 1 minusOA(n
minusA) the following statements hold
(i) (Weak local semi-circular law) For any interval I sub R the number of eigenvaluesin I is at most no(1)(1 +n|I|)
(ii) (Eigenvector delocalisation) All of the coefficients of all of the eigenvectors u1 unhave magnitude O(nminus12+o(1))
The same claims hold if one replaces one of the entries of Mn together with its transposeby zero
Proof See for instance [61 Theorem 60 Proposition 62 Corollary 63] relatedresults are also given in the lectures of Erdos Estimates of this form were intro-duced in the work of Erdos Schlein and Yau [27ndash29] One can sharpen theseestimates in various ways (eg one has improved estimates near the edge ofthe spectrum) but the form of the bounds listed here will suffice for the currentdiscussion
This yields some control on resolvents
Exercise 3018 Let Mn be a real Wigner matrix let A gt 0 be a constant and letz = E+ iη with η gt 0 Show that with probability 1minusOA(nminusA) all coefficients ofR(Mn z) are of magnitude O(no(1)(1+ 1
nη )) and all coefficients of R(Mn z)2 areof magnitude O(no(1)ηminus1(1 + 1
nη )) (Hint use the spectral theorem to expressR(Mn z) in terms of the eigenvalues and eigenvectors of Mn You may wish
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
32 Least singular value circular law and Lindeberg exchange
to treat the η gt 1n and η 6 1n cases separately) The bounds here are notoptimal regarding the off-diagonal terms of R(Mn z) or R(Mn z)2 see [31] forsome stronger estimates
On the exceptional event where the above exercise fails we can use the crudeestimate (from spectral theory) that R(Mn z) has operator norm at most 1η
We are now ready to establish (3013) for the Stieltjes transform statistic F(Mn) =s(Mn z) for certain choices of spectral parameter z = E+ iη and certain choicesof Wigner ensemble Mn Let (i j) be an element of ∆ By Exercise 3018 wehave with probability 1 minusO(nminus100) (say) that all coefficients of R(Mnij z) areof magnitude O(no(1)(1 + 1
nη )) and all coefficients of R(Mnij z)2 are of mag-nitude O(ηminus1no(1)(1 + 1
nη )) also from the subgaussian hypothesis we may as-sume that ξij = O(no(1)) without significantly increasing the failure probabil-ity of the above event Among other things this implies that R(Mnij z)Eijhas spectral radius O(no(1)(1 + 1
nη )) Conditioning to this event and assum-ing that η gt nminus32+ε for some fixed ε gt 0 (to keep the spectral radius ofR(Mnij z) 1radic
nξijEij less than one) we then see from (3016) that
F(Mnij +1radicnξijEij) = F(Mnij)
+
infinsumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)
From the coefficient bounds we see that the trace here is of size O(ηminus1(no(1)(1+1nη ))
k) (where the no(1) expression or the implied constant in the O() notationdoes not depend on k) Thus we may truncate the sum at any stage to obtain
F
(Mnij +
1radicnξijEij
)= F(Mnij)
+
msumk=1
(minusξij)k
n1+k2 tr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+Om
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
with probability 1 minusO(nminus100) On the exceptional event we can bound all termson the left and right-hand side crudely by O(n10) (say) Taking expectations andusing the independence of ξij from Mnij we conclude that
EF
(Mnij +
1radicnξijEij
)= EF(Mnij)
+
msumk=1
E(minusξij)k
n1+k2 Etr(R(Mnij z)2Eij(R(Mnij z)Eij)kminus1
)+O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 33
for any 1 6 m 6 10 (say) Similarly with ξij replaced by ηij If we assume thatξij and ηij have m matching moments in the sense that
Eξkij = Eηkij
for all k = 1 m we conclude on subtracting that
EF
(Mnij +
1radicnξijEij
)minus EF
(Mnij +
1radicnηijEij
)= O
(ηminus1nminusm+3
2 +o(1)(1 +1nη
)m+1)
Comparing this with (3013) we arrive at the following conclusions for the statis-tic F(Mn) = s(Mn z)
bull By definition of a Wigner matrix we already have 2 matching momentsSetting m = 2 we conclude that Es(Mn z) enjoys a two moment theoremwhenever η gt nminus12+ε for some fixed ε gt 0bull If we additionally assume a third matching moment Eξ3
ij = Eη3ij then we
may set m = 3 and we conclude that Es(Mn z) enjoys a three momenttheorem whenever η gt nminus1+ε for a fixed ε gt 0
bull If we assume third and fourth matching moments Eξ3ij = Eη3
ij Eξ4ij =
Eη4ij then we may set m = 4 and we conclude that Es(Mn z) enjoys a
four moment theorem whenever η gt nminus 1312+ε for some ε gt 0
Thus we see that the expected Stieltjes transform Es(Mn z) has some univer-sality although the amount of universality provided by the Lindeberg exchangemethod degrades as z approaches the real axis Additional matching momentsbeyond the fourth will allow one to approach the real axis even further althoughwith the bounds provided here one cannot get closer than nminus32 due to the poten-tial divergence of the Neumann series beyond this point Some of the exponentsin the above results can be improved by using more refined control on the Stielt-jes transform and by using the Knowles-Yin variant of the Lindeberg exchangemethod see for instance [41]
One can generalise these arguments to more complicated statistics than theStieltjes transform s(Mn z) For instance one can establish a four moment theo-rem for multilinear averages of the Stieltjes transform
Exercise 3019 Let k be a fixed natural number and let ψ Rk rarr C be a smoothcompactly supported function both of which are independent of n Let z = E+ iηbe a complex number with η gt nminus1minus 1
100k Show that the statistic
E
intRkψ(t1 tk)
kprodl=1
s
(Mn z+
tln
)dt1 dtk
enjoys a four moment theorem Similarly if one replaces one or more of thes(Mn z+ tl
n ) with their complex conjugates
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
34 Least singular value circular law and Lindeberg exchange
Using this one can then obtain analogous four moment theorems for k-pointcorrelation functions Recall that for any fixed 1 6 k 6 n the k-point correlationfunction ρ(k) Rk rarr R+ of a random real symmetric (or Hermitian) matrix Mnis defined via duality by requiring that ρ(k) be symmetric and obey the relationint
Rkρ(k)(x1 xk)F(x1 xk) dx1 dxk = E
sum16i1ltmiddotmiddotmiddotltik6n
F(λi1 λik)
for all continuous compactly supported symmetric functions F Rk rarr C whereλ1 6 middot middot middot 6 λn denote the eigenvalues of Mn arranged in increasing order (andcounting multiplicity) From the Riesz representation theorem we see that thisdefines ρ(k) as a Radon measure at least if Mn has an absolutely continuousdistribution (as is the case for instance with the GOE ensemble) then ρ will infact be a locally integrable function Setting k = 1 and F(x) = 1
xminusz we see inparticular that
(3020)int
R
ρ(1)(x)dx
xminus z= nEs(Mn z)
Similarly setting k = 2 and F(x1 x2) = 1x1minusz1
1x2minusz2
+ 1x1minusz2
1x2minusz1
then settingk = 1 and F(x) = 1
(xminusz1)(xminusz2) and adding we see that
2int
R2ρ(2)(x1 x2)
dx1dx2
(x1 minus z1)(x2 minus z2)
+
intR2ρ(1)(x)
dx
(xminus z1)(xminus z2)= n2Es(Mn z1)s(Mn z2)
By combining these sorts of identities with Exercise 3019 one can obtain fourmoment theorems for correlation functions For instance we have
Proposition 3021 Let E be a real number and let ψ R rarr R be a smooth compactlysupported function Then the statistic
1n
intR
ρ(1)(E+s
n)ψ(s) ds
enjoys a four moment theorem
Proof Set η = nminus1minus 1100 From Exercise 3019 the statistic
E
intR
ψ(t)s(MnE+t
n+ iη) dt
enjoys a four moment theorem Applying (3020) we conclude that1n
intR
intR
ρ(1)(x)ψ(t)dxdt
xminus Eminus tn minus iη
enjoys a four moment theorem Making the change of variables x = E+ sn this
becomes1n
intR
ρ(1)(E+s
n)
(intR
ψ(t) dt
sminus tminus inη
)ds
Taking imaginary parts and dividing by π we conclude that
1n
intR
ρ(1)(E+s
n)
(intR
nηψ(t) dt
π((sminus t)2 + (nη)2)
)ds
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
Terence Tao 35
enjoys a four moment theorem However from the smoothness and compactsupport of ψ and the fact that the Cauchy distribution nη
π((sminust)2+(nη)2)integrates
to one one can establish a bound of the formintR
nηψ(t) dt
π((sminus t)2 + (nη)2)= ψ(s) +O
(nminus1100
1 + s2
)(Exercise) On the other hand from (3020) and Theorem 3017 one can showthat
1n
intR
ρ(1)(E+
s
n
) ds
1 + s2 no(1)
(Exercise) The claim now follows from the triangle inequality
Exercise 3022 Justify the two steps marked (Exercise) in the above proof
Exercise 3023 If k is a fixed natural number E1 Ek are real numbers andψ Rk rarr R is a smooth compactly supported function show that the statistic
1nk
intR
ρ(k)(E1 +
s1
n Ek +
skn
)ψ(s1 sk) ds1 sk
enjoys a four moment theorem
With a bit more effort (using now the real part of the Stieltjes transform in ad-dition to the imaginary part) one can also use Exercise 3019 to establish a fourmoment theorem for statistics involving the individual eigenvalues and eigenvec-tors of Mn see [41] for details Such theorems were also established by directTaylor expansion of the eigenvalues and eigenvectors see [61] [62]
Of course one would like to also establish universality results for classes ofrandom matrices with fewer than four matching moments At the scale of themean eigenvalue spacing of 1n it appears (in the bulk at least) that the Linde-berg exchange method is not powerful enough on its own to accomplish this taskHowever the Lindeberg exchange method combines well with other universalityresults involving ensembles in which the third and fourth moments are allowedto vary For instance a breakthrough work of Johansson [39] obtained universal-ity results for matrices that were gauss divisible in the sense that they were of theform
Mtn = (1 minus t)12Mn + t12Gn
where Gn was a GUE matrix Mn was a (Hermitian) Wigner matrix (with thesame first two moments as the GUE matrix) and 0 lt t lt 1 was a fixed param-eter This combines well with the four moment theorem to obtain universalityresults for a relatively wide class of Hermitian Wigner matrices see [61] Thisapproach has not yet been extended to the case of real Wigner matrices Howevera different way to obtain universality between a real Wigner matrix Mn and aGOE matrix Gn (which we take to be independent of Mn) is not to exchangethe elements from Mn to Gn one at a time but instead to consider an Ornstein-Uhlenbeck type process that flows continuously from Mn to Gn from time t = 0
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
36 References
to time t = 1 by the formula
Mtn = (1 minus t)12Mn + t12Gn
Clearly Mtn equals Mn when t = 0 and Gn when t = 1 For any intermediatetime t Mtn is a real Wigner matrix (of a special type known as a Gauss divisibleWigner matrix) and the third and fourth moments of the components of Mtnvary in some explicit polynomial fashion from those of ξij to those of ηij A keyfeature of this flow is that the eigenvalues of Mtn evolve by the laws of DysonBrownian motion Using techniques such as the method of local relaxation flow asdiscussed in the lecture notes of Erdos one can obtain good universality results10
that compareMtn to Gn for t as low as nminus1+ε for any fixed ε gt 0 Meanwhile theLindeberg exchange method can be used to compare11 Mtn to Mn for t = nminus1+εCombining these two claims one can obtain universality results for Wigner ma-trices that do not require additional matching moments beyond the second seeeg [26] or [31] On the other hand there are some spectral statistics particularlythose involving a fixed eigenvalue of a Wigner matrix for which the matching ofthe fourth moment is in fact necessary see [64] [24] Also at this present timethe Lindeberg exchange method is the only tool that has been effectively used toobtain local universality results for non-Hermitian matrices [65]
References[1] Radosław Adamczak On the Marchenko-Pastur and circular laws for some classes of random matrices
with dependent entries Electron J Probab 16 (2011) no 37 1068ndash1095 DOI 101214EJPv16-899MR2820070larr16
[2] Radosław Adamczak and Djalil Chafaiuml Circular law for random matrices with unconditionallog-concave distribution Commun Contemp Math 17 (2015) no 4 1550020 22 DOI101142S0219199715500200 MR3359233larr16
[3] Radosław Adamczak Djalil Chafaiuml and Paweł Wolff Circular law for random matrices with ex-changeable entries Random Structures Algorithms 48 (2016) no 3 454ndash479 DOI 101002rsa20599MR3481269larr16
[4] Radosław Adamczak Olivier Gueacutedon Alexander Litvak Alain Pajor and Nicole Tomczak-Jaegermann Smallest singular value of random matrices with independent columns C R Math AcadSci Paris 346 (2008) no 15-16 853ndash856 DOI 101016jcrma200807011 (English with Englishand French summaries) MR2441920larr4
[5] J Alt L Erdos and T Krueger Local inhomogeneous circular law preprintlarr16[6] Marwa Banna Limiting spectral distribution of Gram matrices associated with functionals of β-
mixing processes J Math Anal Appl 433 (2016) no 1 416ndash433 DOI 101016jjmaa201507064MR3388800larr28
[7] Marwa Banna and Florence Merlevegravede Limiting spectral distribution of large sample covariance ma-trices associated with a class of stationary processes J Theoret Probab 28 (2015) no 2 745ndash783 DOI101007s10959-013-0508-x MR3370674larr28
10Strictly speaking the method of local relaxation flow requires an additional averaging in the energyparameter E in order to obtain this claim However more recent methods are now available to avoidthis averaging in energy see [15]11There is a minor additional issue because the third and fourth moments of the coefficients of Mn
and Mtn are not quite identical however it turns out that the additional error terms caused by this
are negligible when t = nminus1+ε for ε small enough
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
References 37
[8] Marwa Banna Florence Merlevegravede and Magda Peligrad On the limiting spectral distribution for alarge class of symmetric random matrices with correlated entries Stochastic Process Appl 125 (2015)no 7 2700ndash2726 DOI 101016jspa201501010 MR3332852larr28
[9] Z D Bai Circular law Ann Probab 25 (1997) no 1 494ndash529 DOI 101214aop1024404298MR1428519larr16 22
[10] A Basak N Cook and O Zeitouni Circular law for the sum of random permutation matricespreprintlarr16
[11] A Basak and M Rudelson Circular law for sparse matrices preprintlarr16[12] Anirban and Dembo Basak Amir Limiting spectral distribution of sums of unitary and orthogonal
matrices Electron Commun Probab 18 (2013) no 69 19larr16[13] Charles Bordenave Pietro Caputo and Djalil Chafaiuml Circular law theorem for random Markov ma-
trices Probab Theory Related Fields 152 (2012) no 3-4 751ndash779 DOI 101007s00440-010-0336-1MR2892961larr16
[14] Charles Bordenave and Djalil Chafaiuml Around the circular law Probab Surv 9 (2012) 1ndash89 DOI10121411-PS183 MR2908617larr16
[15] Paul Bourgade Laszlo Erdos Horng-Tzer Yau and Jun Yin Fixed energy universality for generalizedWigner matrices Comm Pure Appl Math 69 (2016) no 10 1815ndash1881 DOI 101002cpa21624MR3541852larr36
[16] Paul and Yau Bourgade Horng-Tzer and Yin Local circular law for random matrices Probab TheoryRelated Fields 159 (2014) no 3-4 545ndash595larr16
[17] Paul and Yau Bourgade Horng-Tzer and Yin The local circular law II the edge case Probab TheoryRelated Fields 159 (2014) no 3-4 619ndash660larr16
[18] Jean Bourgain Van H Vu and Philip Matchett Wood On the singularity probability of discrete ran-dom matrices J Funct Anal 258 (2010) no 2 559ndash603 DOI 101016jjfa200904016 MR2557947
larr7 9[19] S Chatterjee A simple invariance theorem preprintlarr28[20] N Cook The circular law for random regular digraphs preprintlarr16[21] N Cook W Hachem J Najim and D Renfrew Limiting spectral distribution for non-hermitian
random matrices with a variance profile preprintlarr16[22] Alan Edelman The probability that a random real Gaussian matrix has k real eigenvalues re-
lated distributions and the circular law J Multivariate Anal 60 (1997) no 2 203ndash232 DOI101006jmva19961653 MR1437734larr16
[23] Alan Edelman Eigenvalues and condition numbers of random matrices SIAM J Matrix Anal Appl 9(1988) no 4 543ndash560 DOI 1011370609045 MR964668larr13
[24] Alan Edelman A Guionnet and S Peacutecheacute Beyond universality in random matrix theory Ann ApplProbab 26 (2016) no 3 1659ndash1697 DOI 10121415-AAP1129 MR3513602larr36
[25] P Erdoumls On a lemma of Littlewood and Offord Bull Amer Math Soc 51 (1945) 898ndash902 DOI101090S0002-9904-1945-08454-7 MR0014608larr7
[26] Laacuteszloacute Erdos Joseacute Ramiacuterez Benjamin Schlein Terence Tao Van Vu and Horng-Tzer Yau Bulkuniversality for Wigner Hermitian matrices with subexponential decay Math Res Lett 17 (2010) no 4667ndash674 DOI 104310MRL2010v17n4a7 MR2661171larr36
[27] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Semicircle law on short scales and delocal-ization of eigenvectors for Wigner random matrices Ann Probab 37 (2009) no 3 815ndash852 DOI10121408-AOP421 MR2537522larr31
[28] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Local semicircle law and complete delo-calization for Wigner random matrices Comm Math Phys 287 (2009) no 2 641ndash655 DOI101007s00220-008-0636-9 MR2481753larr31
[29] Laacuteszloacute Erdos Benjamin Schlein and Horng-Tzer Yau Wegner estimate and level repulsion forWigner random matrices Int Math Res Not IMRN 3 (2010) 436ndash479 DOI 101093imrnrnp136MR2587574larr31
[30] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Bulk universality for generalized Wigner matricesProbab Theory Related Fields 154 (2012) no 1-2 341ndash407 DOI 101007s00440-011-0390-3MR2981427larr
[31] Laacuteszloacute Erdos Horng-Tzer Yau and Jun Yin Rigidity of eigenvalues of generalized Wigner matricesAdv Math 229 (2012) no 3 1435ndash1515 DOI 101016jaim201112010 MR2871147larr32 36
[32] Stuart Geman A limit theorem for the norm of random matrices Ann Probab 8 (1980) no 2 252ndash261larr3
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
38 References
[33] Jean Ginibre Statistical ensembles of complex quaternion and real matrices J Mathematical Phys 6(1965) 440ndash449larr16
[34] V L Girko The circular law Teor Veroyatnost i Primenen 29 (1984) no 4 669ndash679 (Russian)MR773436larr16 21
[35] V L Girko The strong circular law Twenty years later II Random Oper Stochastic Equations 12(2004) no 3 255ndash312 DOI 1011631569397042222477 MR2085255larr16
[36] F Goumltze and A Tikhomirov On the circular law preprintlarr11 16 22[37] Alice Guionnet Manjunath Krishnapur and Ofer Zeitouni The single ring theorem Ann of Math
(2) 174 (2011) no 2 1189ndash1217 DOI 104007annals2011174210 MR2831116larr16[38] G Halaacutesz Estimates for the concentration function of combinatorial number theory and probability
Period Math Hungar 8 (1977) no 3-4 197ndash211 DOI 101007BF02018403 MR0494478larr9[39] Kurt Johansson Universality of the local spacing distribution in certain ensembles of Hermitian Wigner
matrices Comm Math Phys 215 (2001) no 3 683ndash705 DOI 101007s002200000328 MR1810949
larr35[40] Jeff Kahn Jaacutenos Komloacutes and Endre Szemereacutedi On the probability that a random plusmn1-matrix is
singular J Amer Math Soc 8 (1995) no 1 223ndash240 DOI 1023072152887 MR1260107larr9[41] A Knowles and J Yin Anisotropic local laws for random matrices preprintlarr27 28 33 35[42] Antti Knowles and Jun Yin Eigenvector distribution of Wigner matrices Probab Theory Related
Fields 155 (2013) no 3-4 543ndash582 DOI 101007s00440-011-0407-y MR3034787larr28[43] J Komloacutes On the determinant of (0 1) matrices Studia Sci Math Hungar 2 (1967) 7ndash21 MR0221962larr2 7
[44] J W Lindeberg Eine neue Herleitung des Exponentialgesetzes in der WahrscheinlichkeitsrechnungMath Z 15 (1922) no 1 211ndash225 DOI 101007BF01494395 (German) MR1544569larr24
[45] Madan Lal Mehta Random matrices 3rd ed Pure and Applied Mathematics (Amsterdam)vol 142 ElsevierAcademic Press Amsterdam 2004 MR2129906larr
[46] James A and Speicher Mingo Roland Free probability and random matrices Fields Institute Mono-graphs vol 35 Springer New York Fields Institute for Research in Mathematical SciencesToronto ON 2017larr19
[47] Hoi H Nguyen Random doubly stochastic matrices the circular law Ann Probab 42 (2014) no 31161ndash1196 DOI 10121413-AOP877 MR3189068larr16
[48] Hoi H Nguyen and V H Vu Normal vector of a random hyperplane 3 (2016) available atarXiv160404897larr13
[49] H Nguyen and S OrsquoRourke The elliptic law preprintlarr16[50] Hoi H Nguyen and Van H Vu Circular law for random discrete matrices of given row sum J Comb
4 (2013) no 1 1ndash30 DOI 104310JOC2013v4n1a1 MR3064040larr16[51] Guangming Pan and Wang Zhou Circular law extreme singular values and potential theory J Mul-
tivariate Anal 101 (2010) no 3 645ndash656 DOI 101016jjmva200908005 MR2575411 larr11 1622
[52] Mark Rudelson Invertibility of random matrices norm of the inverse Ann of Math (2) 168 (2008)no 2 575ndash600 DOI 104007annals2008168575 MR2434885larr2 10
[53] Mark Rudelson and Roman Vershynin The Littlewood-Offord problem and invertibility of randommatrices Adv Math 218 (2008) no 2 600ndash633 DOI 101016jaim200801010 MR2407948larr10
[54] Mark Rudelson and Roman Vershynin The least singular value of a random square ma-trix is O(nminus12) C R Math Acad Sci Paris 346 (2008) no 15-16 893ndash896 DOI101016jcrma200807009 (English with English and French summaries) MR2441928larr12
[55] Michel Talagrand Concentration of measure and isoperimetric inequalities in product spaces InstHautes Eacutetudes Sci Publ Math 81 (1995) 73ndash205 MR1361756larr12
[56] Terence Tao Topics in random matrix theory Graduate Studies in Mathematics vol 132 AmericanMathematical Society Providence RI 2012 MR2906465larr1
[57] Terence Tao and Van Vu On the singularity probability of random Bernoulli matrices J Amer MathSoc 20 (2007) no 3 603ndash628 DOI 101090S0894-0347-07-00555-3 MR2291914larr9
[58] Terence Tao and Van Vu Random matrices the circular law Commun Contemp Math 10 (2008)no 2 261ndash307 DOI 101142S0219199708002788 MR2409368larr11 16 22
[59] Terence Tao and Van H Vu Inverse Littlewood-Offord theorems and the condition number of randomdiscrete matrices Ann of Math (2) 169 (2009) no 2 595ndash632 DOI 104007annals2009169595MR2480613larr10
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-
References 39
[60] Terence Tao and Van Vu Random matrices the distribution of the smallest singular values GeomFunct Anal 20 (2010) no 1 260ndash297 DOI 101007s00039-010-0057-8 MR2647142larr13
[61] Terence Tao and Van Vu Random matrices universality of local eigenvalue statistics Acta Math 206(2011) no 1 127ndash204 DOI 101007s11511-011-0061-3 MR2784665larr28 31 35
[62] Terence Tao and Van Vu Random matrices universal properties of eigenvectors Random MatricesTheory Appl 1 (2012) no 1 1150001 27 DOI 101142S2010326311500018 MR2930379larr35
[63] Terence Tao and Van Vu Random matrices universality of ESDs and the circular law Ann Probab 38(2010) no 5 2023ndash2065 DOI 10121410-AOP534 With an appendix by Manjunath KrishnapurMR2722794larr16 22 23
[64] Terence Tao and Van Vu Random matrices localization of the eigenvalues and the necessity of fourmoments Acta Math Vietnam 36 (2011) no 2 431ndash449 MR2908537larr36
[65] Terence and Vu Tao Van Random matrices universality of local spectral statistics of non-Hermitianmatrices Ann Probab 43 (2015) no 2 782ndash874larr16 36
[66] L N Trefethen Pseudospectra of matrices Numerical analysis 1991 (Dundee 1991) Pitman ResNotes Math Ser vol 260 Longman Sci Tech Harlow 1992 pp 234ndash266 MR1177237larr17
[67] Philip Matchett Wood Universality and the circular law for sparse random matrices Ann ApplProbab 22 (2012) no 3 1266ndash1300 DOI 10121411-AAP789 MR2977992larr16
[68] Jun Yin The local circular law III general case Probab Theory Related Fields 160 (2014) no 3-4679ndash732larr16
UCLA Department of Mathematics Los Angeles CA 90095-1555E-mail address taomathuclaedu
- The least singular value
-
- The epsilon-net argument
- Singularity probability
- Lower bound for the least singular value
- Upper bound for the least singular value
- Asymptotic for the least singular value
-
- The circular law
-
- Spectral instability
- Incompleteness of the moment method
- The logarithmic potential
-
- The Lindeberg exchange method
-