introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8...

31
UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD JOHN C. URSCHEL Abstract. The Lanczos method is one of the most powerful and fundamental techniques for solving an extremal symmetric eigenvalue problem. Convergence-based error estimates are well studied, with the estimate depending heavily on the eigenvalue gap. However, in practice, this gap is often relatively small, resulting in significant overestimates of error. One way to avoid this issue is through the use of uniform error estimates, namely, bounds that depend only on the dimension of the matrix and the number of iterations. In this work, we prove a number of upper and lower uniform error estimates for the Lanczos method. These results include the first known lower bounds for error in the Lanczos method and significantly improved upper bounds for error measured in the p-norm, p> 1. These lower bounds imply that the maximum error of m iterations of the Lanczos method over all n×n symmetric matrices does indeed depend on the dimension n. The improved upper and lower bounds for extremal eigenvalues translates immediately to error estimates for the condition number of a symmetric positive definite matrix. In addition, we prove more specific results for matrices that possess some level of eigenvalue regularity or whose eigenvalues converge to some limiting empirical spectral distribution. In particular, if this limiting distribution can be bounded above and below by a Jacobi weight function, then the relative error after m iterations is of order m -2 with high probability. Through numerical experiments, we show that the theoretical estimates of this paper do apply to practical computations for reasonably sized matrices. 1. Introduction The computation of extremal eigenvalues of matrices is one of the most fundamental problems in numerical linear algebra. When a given matrix is large and sparse, methods such as the Jacobi eigenvalue algorithm and QR algorithm become computationally infeasible, and, therefore, techniques that take advantage of the sparsity of the matrix are required. Krylov subspace methods are a powerful class of techniques for approximating extremal eigenvalues, most notably the Arnoldi iteration for non-symmetric matrices and the Lanczos method for symmetric matrices. The Lanczos method is a technique that, given a symmetric matrix A R n×n and an initial vector b R n , iteratively computes a tridiagonal matrix T m R m×m that satisfies T m = Q T m AQ m , where Q m R n×m is an orthonormal basis of the Krylov subspace K m (A, b) = span{b, Ab, ..., A m-1 b}. The eigenvalues of T m , denoted by λ (m) 1 (A, b) ... λ (m) m (A, b), are the Rayleigh-Ritz approximations to the eigenvalues λ 1 (A) ... λ n (A) of A on K m (A, b), and, therefore, are given by λ (m) i (A, b) = min U⊂Km(A,b) dim(U)=m+1-i max xU x6=0 x T Ax x T x , i =1, ..., m, (1) or, equivalently, λ (m) i (A, b) = max U⊂Km(A,b) dim(U)=i min xU x6=0 x T Ax x T x , i =1, ..., m. (2) This description of the Lanczos method is sufficient for a theoretical analysis of error (i.e., without round- off error), but, for completeness, we provide a short description of the Lanczos method (when K m (A, b) is full-rank) in Algorithm 1 [17]. For a more detailed discussion of the nuances of practical implementation and techniques to minimize the effects of round-off error, we refer the reader to [7, Section 10.3]. If A has νn non-zero entries, then Algorithm 1 outputs T m after approximately (2ν + 8)mn floating point operations. From here, the eigenvalues of a tridiagonal matrix can be easily computed using a number of different 1991 Mathematics Subject Classification. 15A18, 65F15, 65F50. Key words and phrases. symmetric eigenvalue problem, Krylov subspace method, Lanczos method. 1

Upload: others

Post on 10-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD

JOHN C. URSCHEL

Abstract. The Lanczos method is one of the most powerful and fundamental techniques for solving anextremal symmetric eigenvalue problem. Convergence-based error estimates are well studied, with the

estimate depending heavily on the eigenvalue gap. However, in practice, this gap is often relatively small,resulting in significant overestimates of error. One way to avoid this issue is through the use of uniform error

estimates, namely, bounds that depend only on the dimension of the matrix and the number of iterations.

In this work, we prove a number of upper and lower uniform error estimates for the Lanczos method. Theseresults include the first known lower bounds for error in the Lanczos method and significantly improved

upper bounds for error measured in the p-norm, p > 1. These lower bounds imply that the maximum error

of m iterations of the Lanczos method over all n×n symmetric matrices does indeed depend on the dimensionn. The improved upper and lower bounds for extremal eigenvalues translates immediately to error estimates

for the condition number of a symmetric positive definite matrix. In addition, we prove more specific results

for matrices that possess some level of eigenvalue regularity or whose eigenvalues converge to some limitingempirical spectral distribution. In particular, if this limiting distribution can be bounded above and below

by a Jacobi weight function, then the relative error after m iterations is of order m−2 with high probability.

Through numerical experiments, we show that the theoretical estimates of this paper do apply to practicalcomputations for reasonably sized matrices.

1. Introduction

The computation of extremal eigenvalues of matrices is one of the most fundamental problems in numericallinear algebra. When a given matrix is large and sparse, methods such as the Jacobi eigenvalue algorithmand QR algorithm become computationally infeasible, and, therefore, techniques that take advantage ofthe sparsity of the matrix are required. Krylov subspace methods are a powerful class of techniques forapproximating extremal eigenvalues, most notably the Arnoldi iteration for non-symmetric matrices and theLanczos method for symmetric matrices.

The Lanczos method is a technique that, given a symmetric matrix A ∈ Rn×n and an initial vector b ∈ Rn,iteratively computes a tridiagonal matrix Tm ∈ Rm×m that satisfies Tm = QTmAQm, where Qm ∈ Rn×m isan orthonormal basis of the Krylov subspace

Km(A, b) = span{b, Ab, ..., Am−1b}.

The eigenvalues of Tm, denoted by λ(m)1 (A, b) ≥ ... ≥ λ(m)

m (A, b), are the Rayleigh-Ritz approximations tothe eigenvalues λ1(A) ≥ ... ≥ λn(A) of A on Km(A, b), and, therefore, are given by

λ(m)i (A, b) = min

U⊂Km(A,b)dim(U)=m+1−i

maxx∈Ux 6=0

xTAx

xTx, i = 1, ...,m,(1)

or, equivalently,

λ(m)i (A, b) = max

U⊂Km(A,b)dim(U)=i

minx∈Ux 6=0

xTAx

xTx, i = 1, ...,m.(2)

This description of the Lanczos method is sufficient for a theoretical analysis of error (i.e., without round-off error), but, for completeness, we provide a short description of the Lanczos method (when Km(A, b) isfull-rank) in Algorithm 1 [17]. For a more detailed discussion of the nuances of practical implementationand techniques to minimize the effects of round-off error, we refer the reader to [7, Section 10.3]. If A hasνn non-zero entries, then Algorithm 1 outputs Tm after approximately (2ν+ 8)mn floating point operations.From here, the eigenvalues of a tridiagonal matrix can be easily computed using a number of different

1991 Mathematics Subject Classification. 15A18, 65F15, 65F50.

Key words and phrases. symmetric eigenvalue problem, Krylov subspace method, Lanczos method.

1

Page 2: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

2 JOHN C. URSCHEL

Algorithm 1 Lanczos Method

Input: symmetric matrix A ∈ Rn×n, vector b ∈ Rn, number of iterations m.Output: symmetric tridiagonal matrix Tm ∈ Rm×m, Tm(i, i) = αi, Tm(i, i+ 1) = βi,

satisfying Tm = QTmAQm, where Qm = [q1 ... qm].

Set β0 = 0, q0 = 0, q1 = b/‖b‖For i = 1, ...,m

v = Aqiαi = qTi vv = v − αiqi − βi−1qi−1

βi = ‖v‖qi+1 = v/βi

techniques, such as the divide-and-conquer algorithm and the fast multipole method. The complexity of thiscomputation is typically negligible compared to the Lanczos tridiagonalization procedure, as, in practice,when using an iterative method to estimate extreme eigenvalues of a large matrix, m is orders of magnitudeless than n.

Equation (1) for the Ritz values λ(m)i illustrates the significant improvement that the Lanczos method

provides over the power method for extremal eigenvalues. Whereas the power method uses only the iterateAmb as an approximation of the largest magnitude eigenvalue, the Lanczos method uses the span of all ofthe iterates of the power method (given by Km+1(A, b)). However, the analysis of the Lanczos method issignificantly more complicated than that of the power method. While the majority of approximation resultsfor the power method are easily derived, a number of open questions regarding the analysis of the Lanczosmethod remain. Error estimates for extremal eigenvalue approximation using the Lanczos method have beenwell studied, most notably by Kaniel [8], Paige [12], Saad [14], and Kuczynski and Wozniakowski [9] (othernotable work includes [4, 10, 13, 18]). The work of Kaniel, Paige, and Saad focuses on the convergence ofthe Lanczos method as m increases, and, therefore, their results have strong dependence on the spectrum ofthe matrix A and the choice of initial vector b. For example, a standard result of this type is the estimate

(3)λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)≤(

tan∠(b, ϕ1)

Tm−1(1 + 2γ)

)2

, γ =λ1(A)− λ2(A)

λ2(A)− λn(A),

where ϕ1 is the eigenvector corresponding to λ1, and Tm−1 is the (m− 1)th degree Chebyshev polynomial ofthe first kind [15, Theorem 6.4]. Kuczynski and Wozniakowski took a quite different approach, and estimated

the maximum expected relative error (λ1 − λ(m)1 )/λ1 over all n × n symmetric positive definite matrices,

resulting in error estimates that depend only on the dimension n and the number of iterations m. Theyproduced the estimate

(4) supA∈Sn++

Eb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)

]≤ .103

ln2(n(m− 1)4)

(m− 1)2,

for all n ≥ 8 and m ≥ 4, where Sn++ is the set of all n × n symmetric positive definite matrices, and the

expectation is with respect to the uniform probability measure on the hypersphere Sn−1 = {x ∈ Rn | ‖x‖ = 1}[9, Theorem 3.2]. One can quickly verify that equation (4) also holds when the λ1(A) term in the denominatoris replaced by λ1(A) − λn(A), and the supremum over Sn++ is replaced by the maximum over the set of alln× n symmetric matrices, denoted by Sn.

Both of these approaches have their benefits and drawbacks. If an extremal eigenvalue is known to havea reasonably large eigenvalue gap (based on application or construction), then a distribution dependentestimate provides a very good approximation of error, even for small m. However, if the eigenvalue gap isnot especially large, then distribution dependent estimates can significantly overestimate error, and estimatesthat depend only on n and m are preferable. This is illustrated by the following elementary, yet enlightening,example.

Example 1.1. Let A ∈ Sn++ be the tridiagonal matrix resulting from the discretization of the Laplacianoperator on an interval with Dirichlet boundary conditions, namely, Ai,i = 2 for i = 1, ..., n and Ai,i+1 =

Page 3: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 3

Ai+1,i = −1 for i = 1, ..., n−1. The eigenvalues of A are given by λi(A) = 2+2 cos (iπ/(n+ 1)), i = 1, ..., n.Consider the approximation of λ1(A) by m iterations of the Lanczos method. For a random choice of b, theexpected value of tan2∠(b, ϕ1) is (1 + o(1))n. If tan2∠(b, ϕ1) = Cn for some constant C, then (3) producesthe estimate

λ1(A)− λ(m)1 (A, b)

λ1(A)− λn(A)≤ CnTm−1

(1 + 2 tan

2(n+ 1)

)tan

(3π

2(n+ 1)

))−2

' n(1 +O(n−1))−m ' n.

In this instance, the estimate is a trivial one for all choices of m, which varies greatly from the errorestimate (4) of order ln2 n/m2. The exact same estimate holds for the smallest eigenvalue λn(A) when thecorresponding bounds are applied, since 4I −A is similar to A.

Now, consider the approximation of the largest eigenvalue of B = A−1 by m iterations of the Lanc-zos method. The matrix B possesses a large gap between the largest and second-largest eigenvalue, whichresults in a value of γ for (3) that remains bounded below by a constant independent of n, namely γ =(2 cos (π/(n+ 1)) + 1) / (2 cos (π/(n+ 1))− 1). Therefore, in this instance, the estimate (3) illustrates aconstant convergence rate, produces non-trivial bounds (for typical b) for m = Θ(lnn), and is preferable tothe error estimate (4) of order ln2 n/m2.

This example illustrates the benefits and drawbacks that both types of estimates possess. More generally,if a matrix A has eigenvalue gap γ . n−α and the initial vector b satisfies tan2∠(b, ϕ1) & n, then the

error estimate (3) is a trivial one for m . nα/2. This implies that the estimate (3) is most useful when theeigenvalue gap is constant, or tends to zero very slowly (i.e., logarithmically). When the gap is not especiallylarge (say, n−α, α constant), then uniform error estimates are preferable for small values of m.

In this work, we focus almost exclusively on uniform bounds, namely, error estimates that hold uniformlyover some large set of matrices (typically Sn++ or Sn). We begin by recalling some of the key existing uniformerror estimates for the Lanczos method.

1.1. Related Work. Uniform error estimates for the Lanczos method have been produced almost exclusivelyfor symmetric positive definite matrices, as error estimates for extremal eigenvalues of symmetric matricescan be produced from estimates for Sn++ relatively easily. In addition, the majority of results apply only toeither λ1(A), λn(A), or some function of the two (i.e., condition number). All estimates are probabilistic innature and take the initial vector b to be uniformly distributed on the hypersphere. Here we provide a short,non-exhaustive description of some of the key uniform error estimates previously produced for the Lanczosmethod.

In [9], Kuczynski and Wozniakowski produced a complete analysis of the power method and provideda number of upper bounds for the Lanczos method. Most notably, they produced error estimate (4) and

provided the following upper bound for the probability that the relative error (λ1−λ(m)1 )/λ1 is greater than

some value ε:

(5) supA∈Sn++

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)> ε

]≤ 1.648

√ne−

√ε(2m−1).

However, the authors were unable to produce any lower bounds for (4) or (5), and stated that a more sophis-ticated analysis would most likely be required. In the same paper, numerical experiments were performedfor the Lanczos method, and produced an error of the order m−2, leading the authors to suggest that theerror estimate (4) may be an overestimate, and that the ln2 n term may be unnecessary.

In [10], Kuczynski and Wozniakowski noted that the above estimates immediately translate to relative

error estimates for minimal eigenvalues when the error λ(m)m − λn is considered relative to λ1 or λ1 − λn

(both normalizations can be shown to produce the same bound). However, it can be quickly verified that

there exist sequences in Sn++ for which the quantity E[(λ(m)m − λn)/λn

]is unbounded. These results for

minimal eigenvalues, combined with (4), led to error bounds for estimating the condition number of a matrix.

Unfortunately, error estimates for the condition number face the same issue as the quantity (λ(m)m −λn)/λn,

and therefore, only estimates that depend on the value of the condition number can be produced.

The proof technique used to produce (4) works specifically for the quantity E[(λ1 − λ(m)

1 )/λ1

](i.e., the

1-norm), and does not carry over to more general p-norms of the form E[((λ1 − λ(m)

1 )/λ1)p]1/p

, p ∈ (1,∞).

Page 4: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

4 JOHN C. URSCHEL

Later, in [4, Theorem 5.2, r = 1], Del Corso and Manzini produced an upper bound for arbitrary p-norms,given by

(6) supA∈Sn++

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)

)p]1/p

.1

m1/p

(Γ(p− 1/2)Γ(n/2)

Γ(p)Γ((n− 1)/2)

)1/p

= Θ

((√n

m

)1/p).

This bound is clearly worse than (4), but it is the best known bound for p > 1. Again, the authors wereunable to produce any lower bounds.

1.2. Contributions and Remainder of Paper. In what follows, we obtain improved upper bounds for

E[((λ1 − λ(m)

1 )/λ1)p]1/p

for all p ∈ (1,∞), and produce nearly-matching lower bounds. In particular, weprove the following theorem.

Theorem 1.2.

maxA∈Sn

Eb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)

]≥ (1− o(1))×

1 for m = o(lnn)

1− α for m ≤ lnn− ln(ln2 n)

4 ln(

1+√

2√α

) , α ∈ (0, 1]

.249 lnn

m2 ln lnnfor m = O(lnn)

.637

m2for m = o

(n1/2

ln1/2 n

),

and

maxA∈Sn

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)

)p]1/p

≤ .068ln2(n(m− 1)8p

)(m− 1)2

, for n ≥ 100, m ≥ 10, p ≥ 1.

Proof. The result follows from Lemmas 3.4, 3.7, 3.10, and 4.1. �

By Holder’s inequality, the lower bounds in Theorem 1.2 also hold for arbitrary p-norms, p ∈ [1,∞), aswell. Through the course of producing the above estimates, we can make even stronger statements regardingbounds that occur with high probability. In particular, we also prove the following theorem.

Theorem 1.3.

maxA∈Sn

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)≥ 1− o(1)

]≥ 1− o(1/n)

for m = o(lnn),

maxA∈Sn

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)≥ (1− o(1))(1− α)

]≥ 1− o(1/n)

for m ≤ lnn− ln(ln2 n)

4 ln(

1+√

2√α

) , α ∈ (0, 1],

maxA∈Sn

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)≥ .249 lnn

m2 ln lnn

]≥ 1− o(1/n)

for m = O(lnn),

maxA∈Sn

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)≥ .637

m2

]≥ 1− o(1/n)

for m = o

(n1/2

ln1/2 n

), and

maxA∈Sn

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)≤ .571

ln2(n(m− 1)2/3

)(m− 1)2

]≥ 1− o(1/n)

for all m > 1.

Page 5: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 5

Proof. The result follows from Lemmas 3.4, 3.7, 3.10, and 4.2. �

These two results mark a major breakthrough in the understanding of the Lanczos algorithm and itserror, as the above estimates are the first lower bounds produced for the Lanczos algorithm. Of course, theabove (and all following) results are equally applicable to λn, as the Krylov subspace (and, therefore, theoutput of the Lanczos algorithm) is unaffected by shifting and scaling, namely Km(A, b) = Km(αA+ βI, b)for all α, β ∈ R, α 6= 0. To complement these two results, we also provide error analysis for matricesthat have a certain structure. We produce improved dimension-free upper bounds for matrices that havesome level of eigenvalue “regularity” near λ1, and prove a powerful theorem that can be used to determinethe asymptotic relative error for any sequence of matrices Xn ∈ Sn, n = 1, 2, ..., that exhibits suitableconvergence of its empirical spectral distribution. As an example, we use this theorem to obtain a tightbound of order m−2 for Wigner matrices. This example illustrates that the inverse quadratic dependenceon the number of iterations m in the estimates produced in this paper does not simply illustrate the worstcase, but is actually indicative of the average case in some sense. More generally, we show that if the limitof the empirical spectral distribution can be bounded above and below by a Jacobi weight function, thenthe asymptotic order of convergence is m−2. We also produce results similar to Theorems 1.2 and 1.3 forarbitrary eigenvalues λi. The lower bounds follow relatively quickly from the estimates for λ1, but the upperbounds require some mild assumptions on the eigenvalue gaps of the matrix. These results mark the firstuniform-type bounds for arbitrary eigenvalues. We translate our error estimates for the extremal eigenvaluesλ1(A) and λn(A) into error bounds for the condition number of a symmetric positive definite matrix, but, aspreviously mentioned, the relative error of the condition number of a matrix requires estimates that dependson the condition number itself. Finally, we present numerical experiments that support the accuracy andpractical usefulness of the theoretical estimates in this work.

The remainder of the paper is as follows. In Section 2, we prove basic results regarding relative errorand make a number of fairly straightforward observations. In Section 3, we prove asymptotic lower boundsfor relative error. In Section 4, we produce improved upper bounds for the relative error in an arbitraryp-norm. In Section 5, we produce a dimension-free error estimate for a large class of matrices and prove atheorem that can be used to determine the asymptotic relative error for any sequence of matrices Xn ∈ Sn,n = 1, 2, ..., with suitable convergence of its empirical spectral distribution. We apply this theorem to Wignermatrices and obtain a tight estimate of relative error for these matrices and, more generally, a large class ofmatrices with well-behaved limiting empirical spectral distributions. In Section 6, under some mild additionalassumptions, we prove versions of Theorems 1.2 and 1.3 for arbitrary eigenvalues λi, and extend our resultsfor λ1 and λn to estimates for the condition number of a symmetric positive definite matrix. Finally, inSection 7, we perform a number of numerical experiments and discuss how these numerics compare to thetheoretical estimates in this work.

2. Preliminary Results

The Lanczos method applies only to symmetric matrices, and, therefore, all matrices in this paper areassumed to belong to Sn. In fact, we may restrict ourselves to Sn++, since results for Sn++ are easily extendedto Sn by proper shifting and scaling. In particular, for any matrix A ∈ Sn++ and 0 < ε < λn(A), we haveA− εI ∈ Sn++ and

λ1(A− εI)− λ(m)1 (A− εI, b)

λ1(A− εI)=λ1(A)− λ(m)

1 (A, b)

λ1(A)− ε>λ1(A)− λ(m)

1 (A, b)

λ1(A).

The quantity (λ1(A)− λ(m)1 (A, b))/(λ1(A)− λn(A)) is unaffected by shifts, and so

(7) supA∈Sn++

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)

)p]1/p

= maxA∈Sn

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)

)p]1/p

.

Therefore, for the remainder of the paper, all matrices are also assumed to be positive definite. We begin by

rewriting expression (2) for λ(m)1 in terms of polynomials. The Krylov subspace Km(A, b) can be alternatively

defined as

Km(A, b) = {P (A)b |P ∈ Pm−1},

Page 6: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

6 JOHN C. URSCHEL

where Pm−1 is the set of all real-valued polynomials of degree at most m − 1. Suppose A ∈ Sn++ has

eigendecomposition A = QΛQT , where Q ∈ Rn×n is an orthogonal matrix and Λ ∈ Rn×n is the diagonalmatrix satisfying Λ(i, i) = λi(A), i = 1, ..., n. Then we have the relation

λ(m)1 = max

x∈Km(A,b)x 6=0

xTAx

xTx= maxP∈Pm−1

P 6=0

bTP 2(A)Ab

bTP 2(A)b= maxP∈Pm−1

P 6=0

bTP 2(Λ)Λb

bTP 2(Λ)b,

where b = QT b. We may assume λ1(A) = 1, since

λ1(A)− λ(m)1 (A, b)

λ1(A)= 1− max

x∈Km(A/λ1,b)x 6=0

xT [A/λ1]x

xTx.

Assuming λ1(A) = 1, the relative error is given by

λ1(A)− λ(m)1 (A, b)

λ1(A)= minP∈Pm−1

P 6=0

∑ni=2 b

2iP

2(λi)(1− λi)∑ni=1 b

2iP

2(λi),

and the expected pth moment of the relative error is given by

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)

)p]=

∫Sn−1

minP∈Pm−1

P 6=0

[∑ni=2 b

2iP

2(λi)(1− λi)∑ni=1 b

2iP

2(λi)

]pdσ(b),

where σ is the uniform probability measure on Sn−1. Because the relative error does not depend on thenorm of b or the sign of any entry, we can replace the integral over Sn−1 by an integral of y = (y1, ..., yn)over [0,∞)n with respect to the joint chi-square probability density function

fY (y) =1

(2π)n/2exp

{−1

2

n∑i=1

yi

}n∏i=1

y−1/2i

of n independent chi-square random variables Y1, ..., Yn ∼ χ21 with one degree of freedom each. In particular,

we have

(8) Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)

)p]=

∫[0,∞)n

minP∈Pm−1

P 6=0

[∑ni=2 yiP

2(λi)(1− λi)∑ni=1 yiP

2(λi)

]pfY (y) dy.

Similarly, probabilistic estimates with respect to b ∼ U(Sn−1) can be replaced by estimates with respect toY1, ..., Yn ∼ χ2

1, as

(9) Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)≥ ε

]= PYi∼χ2

1

minP∈Pm−1

P 6=0

∑ni=2 YiP

2(λi)(1− λi)∑ni=1 YiP

2(λi)≥ ε

.For the remainder of the paper, we almost exclusively use equation (8) for expected relative error andequation (9) for probabilistic bounds for relative error. If P minimizes the expression in equation (8) or (9)for a given y or Y , then any polynomial of the form αP , α ∈ R\{0}, is also a minimizer. Therefore, withoutany loss of generality, we can alternatively minimize over the set Pm−1(1) = {Q ∈ Pm−1 |Q(1) = 1}.

3. Asymptotic Lower Bounds

In this section, we produce asymptotic lower bounds for the maximum expected relative error of miterations of the Lanczos method for estimating the largest eigenvalue of a matrix A ∈ Sn++. By Holder’sinequality, lower bounds for arbitrary p-norms follow immediately from bounds for the 1-norm. The generaltechnique we use to produce lower bounds is as follows. We choose an infinite sequence of matrices {An}∞n=1,An ∈ Sn++, treat m as a function of n, and show that, as n tends to infinity, for “most” choices of aninitial vector b, the relative error of this sequence of matrices is well approximated by an integral polynomialminimization problem for which relatively tight bounds can be obtained.

The first goal of this section is to show that, contrary to experimental results and assertions of [9],the maximum expected relative error does indeed depend on n. We first consider the regime in whichm = O(lnn), a region in which no non-constant upper bounds have been produced. We show that, as n

Page 7: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 7

increases, the maximum expected relative error becomes arbitrarily close to one for any m that is o(lnn),and that, in general, one cannot guarantee an expected relative error less than .99 until m ≥ 1/13 lnn.

In order to do so, we must first produce a lower bound for the L2([0, α]) norm, α ∈ (0, 1], of all polynomialsQ ∈ Pm−1(1). We prove the following lemma.

Lemma 3.1. Let m ∈ N and α ∈ (0, 1]. Then

minQ∈Pm−1(1)

∫ α

0

Q2(y) dy ≥ (1 + o(1)) 215/4π3/2

√m− 1α2m−1

m(1 +√

2)4m.

Proof. Let Q(y) =

m−1∑i=0

aiyi, ai ∈ R, i = 0, ...,m− 1. Then

∫ α

0

Q2(y) dy =

∫ α

0

2m−2∑i=0

yi∑p+q=ip,q≥0

apaq

dy =

2m−2∑i=0

αi+1

i+ 1

∑p+q=ip,q≥0

apaq = αaTDHDa,

where a = (a0, a1, ..., am−1)T ∈ Rm, D ∈ Rm×m is the diagonal matrix satisfying Dα(i, i) = αi, i =0, ...,m − 1, and H ∈ Rm×m is the Hilbert matrix H(i, j) = (i + j + 1)−1, i, j = 0, ...,m − 1. We haveconverted our minimization problem into the minimization of the quadratic form αaTDHDa subject to thecondition aT1 = 1, where 1 = (1, ..., 1)T ∈ Rm.

The Hilbert matrix H is positive definite and, by [19, Equation 3.35], equals

λmin(H) = (1 + o(1))π3/2(8 + 6√

2)5/2√m− 1(1 +

√2)−4m−5.

The condition aT1 = 1 implies that ‖a‖1 ≥ 1. Therefore,

‖Da‖2 ≥‖Da‖1√

m≥ αm−1

√m

and

αaTDHDa ≥ αλmin(H)‖Da‖22

≥ (1 + o(1))π3/2(8 + 6

√2)5/2

(1 +√

2)5

√m− 1α2m−1

m(1 +√

2)4m.

This lower bound is only as tight as our purposes require, and tighter results can be obtained at thecost of brevity. In order to convert Lemma 3.1 into an asymptotic lower bound for the error of the Lanczosmethod, we require the following two propositions regarding Gaussian quadrature and Chernoff bounds forchi square random variables.

Proposition 3.2. ([3, Chapter 1.13],[6]) Let P ∈ P2k−1, {xi}ki=1 be the zeros of the kth degree Legendrepolynomial Pk(x), Pk(1) = 1, in descending order (x1 > ... > xk), and wi = 2(1 − x2

i )−1[P ′k(xi)]

−2,i = 1, ..., k. Then

1.

∫ 1

−1

P (x) dx =

k∑i=1

wiP (xi),

2. xi =

(1− 1

8k2

)cos

((4i− 1)π

4k + 2

)+O(k−3), i = 1, ..., k,

3.π

k + 1/2

√1− x2

1

(1− 1

8(k + 1/2)2(1− x21)

)≤ w1 < w2 < ... < w⌊k+1

2

⌋ ≤ π

k + 1/2.

Proposition 3.3. Let Z ∼ χ2k. Then

P[Z ≤ x] ≤[xk

exp{

1− x

k

}]k/2for x ≤ k

and

P[Z ≥ x] ≤[xk

exp{

1− x

k

}]k/2for x ≥ k.

Page 8: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

8 JOHN C. URSCHEL

Proof. The result follows from taking [2, Lemma 2.2] and letting d→∞. �

We are now prepared to prove our desired lower bound.

Lemma 3.4. Let α ∈ (0, 1]. If

m ≤ lnn− ln(ln2 n)

4 ln(

1+√

2√α

) ,

then

supA∈Sn++

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)≥ (1− o(1))(1− α)

]≥ 1− o(1/n),

and, therefore,

supA∈Sn++

Eb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)

]≥ (1− o(1))(1− α).

In addition, if m = o(lnn), then the maximum expected error is 1− o(1).

Proof. The main idea of the proof is to choose a matrix with eigenvalues that partition [0, α], use equation (9)for probabilistic estimates, and show that a large subset of [0,∞)n (with respect to fY (y)) satisfies certainconditions that allow us to lower bound relative error using the integral minimization problem of Lemma3.1. Throughout the proof, we assume m = O(lnn).

Let A ∈ Rn×n have eigenvalue λ1 = 1 with multiplicity one, and the remaining eigenvalues given byα(xj+1)/2, j = 1, ...,m, where xj is the jth zero of the mth degree Legendre polynomial, and each eigenvaluehas multiplicity at least b(n− 1)/mc. By equation (9),

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)≥ ε

]= PYi∼χ2

1

[min

Q∈Pm−1(1)

∑ni=2 YiQ

2(λi)(1− λi)∑ni=1 YiQ

2(λi)≥ ε].

We have

minQ∈Pm−1(1)

∑ni=2 YiQ

2(λi)(1− λi)∑ni=1 YiQ

2(λi)≥ (1− α) min

Q∈Pm−1(1)

∑mj=1 YjQ

2(α(xi+1)

2

)Y1 +

∑mj=1 YjQ

2(α(xi+1)

2

) ,where Y1, ..., Yn ∼ χ2

1, and Yj is the sum of the Yi’s that satisfy λi = α(xj + 1)/2, and, therefore, each Yjhas at least b(n− 1)/mc degrees of freedom. Then, by Proposition 3.3,

P[Y1 ≥ 3 lnn] ≤(

3e lnn

n3

)1/2

= o (1/n) ,

and

P[Yj ≥

1

2

⌊n− 1

m

⌋, j = 1, ...,m

]≥

1−(e1/2

2

) 12bn−1

m cm = 1− o(1/n).

This implies that, with probability 1 − o(1/n), we can upper bound Y1 by 3 lnn and lower bound Yj byb(n− 1)/mc/2 for all j = 1, ...,m, which produces the lower bound

minQ∈Pm−1(1)

∑ni=2 YiQ

2(λi)(1− λi)∑ni=1 YiQ

2(λi)≥ (1− α) min

Q∈Pm−1(1)

b(n−1)/mc2

∑mj=1Q

2(α(xj+1)

2

)3 lnn+ b(n−1)/mc

2

∑mj=1Q

2(α(xj+1)

2

) .Because {xj}mj=1 are the zeros of the mth degree Legendre polynomial, we have

m∑j=1

wjQ2

(α(xj + 1)

2

)=

∫ 1

−1

Q2

(α(x+ 1)

2

)dx =

2

α

∫ α

0

Q2(y) dy,

Page 9: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 9

where wj are the corresponding weights, given by wj = 2(1 − x2j )−1[P ′m(xj)]

−2. By Proposition 3.2, 0 <wj ≤ π/(m+ 1/2). By Lemma 3.1,

minQ∈Pm−1(1)

∑ni=2 yiQ

2(λi)(1− λi)∑ni=1 yiQ

2(λi)≥ (1− α) min

Q∈Pm−1(1)

m+1/22π

⌊n−1m

⌋∑mj=1 wjQ

2(α(xj+1)

2

)3 lnn+ m+1/2

⌊n−1m

⌋∑mj=1 wjQ

2(α(xj+1)

2

)

= (1− α) minQ∈Pm−1(1)

m+1/2απ

⌊n−1m

⌋ ∫ α

0

Q2(y) dy

3 lnn+ m+1/2απ

⌊n−1m

⌋ ∫ α

0

Q2(y) dy

≥ (1− α)M(n,m,α)

3 lnn+M(n,m,α),

where

M(n,m,α) = (1 + o(1))215/4π1/2 (m+ 1/2)(m− 1)1/2

α2m

⌊n− 1

m

⌋( √α

1 +√

2

)4m

.

Let m and α satisfy m = O(lnn), 1/α = O(ln lnn), and( √α

1 +√

2

)4m

≥ ln2 n

n.

Then M(n,m,α)/ lnn → ∞ as n → ∞, and we immediately obtain an asymptotic lower bound of 1 −limn→∞

α(n), provided this limit exists. The first part and second part of the lemma follows from taking α to

be a fixed constant and

m ≤ lnn− ln(ln2 n)

ln(

1+√

2√α

) .

The last statement follows from noting that if m = o(lnn), then [(ln2 n)/n]1/(4m) = o(1). �

This result proves that the maximum expected error does depend on the dimension n of the matrix,as no non-trivial error estimates can be produced when m = o(lnn). One sample estimate that followsimmediately from the previous lemma is that, if m ≤ 1/13 lnn, then the maximum expected error is at least(1−o(1))(1−10−2). The bound in Lemma 3.4 worsens exponentially as m increases, and gives no indicationwhether the estimate m−2 is the correct dependence on m.

Next, we provide a lower bound of order m−2. Again, we begin by estimating a related integral polynomialminimization problem. However, for this minimization problem, rather than represent Pm−1 in the standardbasis {yi}m−1

i=0 , we consider a basis consisting of shifted and scaled Jacobi polynomials. We first recall anumber of basic facts regarding Jacobi polynomials. For a more detailed introduction, we refer the readerto [16].

Proposition 3.5 ([16],Chapter 3.2). Let {Jα,βk (x)}∞k=0, α, β > −1, be the orthogonal system of Jacobi

polynomials over [−1, 1] with respect to weight function ωα,β(x) = (1− x)α(1 + x)β, namely,

Jα,βk (x) =Γ(k + α+ 1)

k! Γ(k + α+ β + 1)

k∑i=0

(k

i

)Γ(k + i+ α+ β + 1)

Γ(i+ α+ 1)

(x− 1

2

)i.

Then

(i)

∫ 1

−1

[Jα,βk (x)

]2ωα,β(x) dx =

2α+β+1Γ(k + α+ 1)Γ(k + β + 1)

(2k + α+ β + 1)k! Γ(k + α+ β + 1),

(ii) Jα+1,βk (x) =

Γ(k + β + 1)

Γ(k + α+ β + 2)

k∑`=0

(2`+ α+ β + 1)Γ(`+ α+ β + 1)

Γ(`+ β + 1)Jα,β` (x),

(iii) maxx∈[−1,1]

∣∣∣Jα,βk (x)∣∣∣ = max

x∈{−1,1}

∣∣∣Jα,βk (x)∣∣∣ = max

{Γ(k + α+ 1)

k! Γ(α+ 1), (−1)k

Γ(k + β + 1)

k! Γ(β + 1)

},

(iv)d

dxJα,βk (x) =

k + α+ β + 1

2Jα+1,β+1k−1 (x),

Page 10: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

10 JOHN C. URSCHEL

(v) Jα,βk (−x) = (−1)kJβ,αk (x).

For the sake of brevity, the inner product and norm on the interval [−1, 1] with respect to the weightωα,β(x) = (1− x)α(1 + x)β will be denoted by 〈·, ·〉α,β and ‖ · ‖α,β , namely

〈P,Q〉α,β =

∫ 1

−1

P (x)Q(x)(1− x)α(1 + x)β dx

and

‖P‖α,β =

(∫ 1

−1

P 2(x)(1− x)α(1 + x)β dz

)1/2

.

Using the alternate basis of Jacobi polynomials for Pm−1 and Proposition 3.5, we present estimates forthe following minimization problem. The following lemma is a key ingredient to a number of estimates ofthis paper, but is also of independent interest in the theory of Jacobi polynomials and weight functions.

Lemma 3.6. Let m ∈ N, and α > 0, β > −1. Then

m(m+ α+ β)≤ minP∈Pm−1

P 6=0

‖P‖2α,β‖P‖2α−1,β

≤ 6α (2 + α/2)α

m(m+ α+ β).

Proof. Let P (y) =

m−1∑i=0

ai Jα,βi (y) ‖Jα,βi ‖−1

α,β , ai ∈ R, i = 0, ...,m− 1. Then

‖P‖2α,β =

m−1∑i=0

a2i and ‖P‖2α−1,β =

m−1∑i,j=0

aiaj〈Jα,βi , Jα,βj 〉α−1,β

‖Jα,βi ‖α,β‖Jα,βj ‖α,β.

By equations (i) and (ii) of Proposition 3.5,

〈Jα,βi , Jα,βj 〉α−1,β =Γ(i+ β + 1)

Γ(i+ α+ β + 1)

Γ(j + β + 1)

Γ(j + α+ β + 1)

min{i,j}∑r=0

[(2r + α+ β)Γ(r + α+ β)

Γ(r + β + 1)

]2

‖Jα−1,βr ‖2α−1,β

= 2α+β Γ(i+ β + 1)

Γ(i+ α+ β + 1)

Γ(j + β + 1)

Γ(j + α+ β + 1)

min{i,j}∑r=0

(2r + α+ β)Γ(r + α+ β)Γ(r + α)

r!Γ(r + β + 1)

= 2α+β Γ(i+ β + 1)

Γ(i+ α+ β + 1)

Γ(j + β + 1)

Γ(j + α+ β + 1)

Γ(j + α+ 1)Γ(j + α+ β + 1)

αj!Γ(j + β + 1)

=2α+β

α

Γ(i+ β + 1)Γ(j + α+ 1)

j!Γ(i+ α+ β + 1)

for i ≥ j, and therefore,

〈Jα,βi , Jα,βj 〉α−1,β

‖Jα,βi ‖α,β‖Jα,βj ‖α,β=

2α+β

αΓ(i+β+1)Γ(j+α+1)j!Γ(i+α+β+1)[

2α+β+1Γ(i+α+1)Γ(i+β+1)(2i+α+β+1)i!Γ(i+α+β+1)

]1/2 [2α+β+1Γ(j+α+1)Γ(j+β+1)(2j+α+β+1)j!Γ(j+α+β+1)

]1/2=

1

[Γ(i+ β + 1)

Γ(j + β + 1)

Γ(j + α+ 1)

Γ(i+ α+ 1)

i!

j!

Γ(j + α+ β + 1)

Γ(i+ α+ β + 1)(2i+ α+ β + 1)(2j + α+ β + 1)

]1/2

for i ≥ j. The Rayleigh quotient ‖P‖2α−1,β/‖P‖2α,β corresponds to a positive definite matrix, and, therefore,can be bounded above by the trace of this matrix. This gives us

maxP∈Pm−1

P 6=0

‖P‖2α−1,β

‖P‖2α,β≤m−1∑i=0

2i+ α+ β + 1

2α=m(m+ α+ β)

2α,

which completes the proof of the lower bound.

Page 11: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 11

To obtain an upper bound, we evaluate the Rayleigh quotient ‖P‖2α−1,β/‖P‖2α,β of the vector a satisfyingai = 1 for i ≥ dm/2e, and ai = 0 otherwise. For simplicity, we assume m ≥ 3 (and so dm/2e ≥ 2). Thisquantity is bounded below by

(m− dm/2e)2(2dm/2e+ α+ β + 1)

2α(m− dm/2e)min

dm/2e≤j≤i≤m−1

[Γ(i+ β + 1)

Γ(j + β + 1)

Γ(j + α+ 1)

Γ(i+ α+ 1)

i!

j!

Γ(j + α+ β + 1)

Γ(i+ α+ β + 1)

]1/2

.

We can lower bound the product of Gamma functions by

Γ(i+ β + 1)

Γ(j + β + 1)

Γ(j + α+ 1)

Γ(i+ α+ 1)

i!

j!

Γ(j + α+ β + 1)

Γ(i+ α+ β + 1)≥ jα(j + β)α

(i+ α)α(i+ α+ β)α≥(

j

i+ α

)2α

≥(

2

4 + α

)−2α

,

which implies that

maxP∈Pm−1

P 6=0

‖P‖2α−1,β

‖P‖2α,β≥ m(m+ α+ β + 1)

6α (2 + α/2)α

for m ≥ 3. Noting that

‖1‖2α−1,β

‖1‖2α,β=α+ β + 1

2αand

‖Jα,β1 ‖2α−1,β

‖Jα,β1 ‖2α,β=α+ β + 3

completes the proof. �

We are now prepared to prove a lower bound of order m−2 for maximum expected error.

Lemma 3.7. If m = o

(n1/2

ln1/2 n

), then

supA∈Sn++

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)≥ .637

m2

]≥ 1− o(1/n),

and, therefore,

supA∈Sn++

Eb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)

]≥ (1− o(1))

.637

m2.

Proof. This proof follows a similar structure to that of Lemma 3.4. We choose a matrix with eigenvaluesbased on the zeros of a Legendre polynomial, and show that a large subset of [0,∞)n (with respect to fY (y))satisfies conditions that allow us to lower bound the relative error using the integral minimization problemof Lemma 3.6. For simplicity, we do not scale our matrix to have spectral radius one (i.e., λ1(A) 6= 1).

Let x1 > ... > x2m be the zeros of the (2m)th degree Legendre polynomial. Let A ∈ Rn×n have eigenvalue(x1 + 1)/2 with multiplicity one, and remaining eigenvalues given by (xj + 1)/2, j = 2, ..., 2m, each withmultiplicity at least b(n− 1)/(2m− 1)c. By equation (9),

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)≥ ε

]= PYi∼χ2

1

1

λ1min

P∈Pm−1

P 6=0

∑ni=2 YiP

2(λi)(λ1 − λi)∑ni=1 YiP

2(λi)≥ ε

.We have

1

λ1min

P∈Pm−1

P 6=0

∑ni=2 YiP

2 (λi) (λ1 − λi)∑ni=1 YiP

2 (λi)=

2

x1 + 1min

P∈Pm−1

P 6=0

∑2mj=2 YjP

2(xj+1

2

)(x1+1

2 − xj+12

)∑2mj=1 YjP

2(xj+1

2

) ,

where Y1, ..., Yn ∼ χ21, and Yj is the sum of the Yi that satisfy λi = (xj + 1)/2. Y1 has one degree of freedom,

and Yj , j = 2, ..., 2m, each have at least b(n − 1)/(2m − 1)c degrees of freedom. Let w1, ..., w2m be theweights of Gaussian quadrature associated with x1, ..., x2m. By Proposition 3.2,

x1 =

(1− 1

32m2

)cos

(3π

8m+ 2

)+O(m−3) = 1− 4 + 9π2

128m2+O(m−3),

x2 =

(1− 1

32m2

)cos

(7π

8m+ 2

)+O(m−3) = 1− 4 + 49π2

128m2+O(m−3),

Page 12: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

12 JOHN C. URSCHEL

and, therefore, 1 − x21 =

4 + 9π2

64m2+ O(m−3). Again, by Proposition 3.2, we can lower bound the smallest

ratio between weights by

w1

wm≥

√1− x2

1

(1− 1

8(2m+ 1/2)2(1− x21)

)=

(√4 + 9π2

8m+O(m−2)

)(1− 1

(4 + 9π2)/2 +O(m−1)

)=

2 + 9π2

8√

4 + 9π2m+O(m−2).

Therefore, by Proposition 3.3,

P[

minj=2,...,2m

Yj ≥wmw1

Y1

]≥ P

[Yj ≥

1

3

⌊n− 1

2m− 1

⌋, j = 2, ..., 2m

]× P

[Y1 ≤

w1

3wm

⌊n− 1

2m− 1

⌋]

1−(e2/3

3

) 12b n−1

2m−1c2m−1(

1−(w1

3wm

⌊n− 1

2m− 1

⌋e1− w1

3wmb n−1

2m−1c)1/2

)= 1− o(1/n).

We now restrict our attention to values of Y = (Y1, ..., Yn) ∈ [0,∞)n that satisfy minj=2,...,2m

Yj ≥wmw1

Y1. If,

for some fixed choice of Y ,

(10) minP∈Pm−1

P 6=0

∑2mj=2 YjP

2(xj+1

2

)(x1+1

2 − xj+12

)∑2mj=1 YjP

2(xj+1

2

) ≤ x1 + 1

2− x2 + 1

2,

then, by Proposition 3.2 and Lemma 3.6,

minP∈Pm−1

P 6=0

∑2mj=2 YjP

2(xj+1

2

)(x1+1

2 − xj+12

)∑2mj=1 YjP

2(xj+1

2

) ≥ minP∈Pm−1

P 6=0

∑2mj=2 wjP

2(xj+1

2

)(x1+1

2 − xj+12

)∑2mj=1 wjP

2(xj+1

2

)= min

P∈Pm−1

P 6=0

∑2mj=1 wjP

2(xj+1

2

)(1− xj+1

2

)∑2mj=1 wjP

2(xj+1

2

) −(

1− x1 + 1

2

)

= minP∈Pm−1

P 6=0

∫ 1

0

P 2(y)(1− y) dy∫ 1

0

P 2(y) dy

− 4 + 9π2

256m2+O(m−3)

≥(

1− 4 + 9π2

256

)1

m2+O(m−3).

Alternatively, if equation (10) does not hold, then we can lower bound the left hand side of (10) by

x1 + 1

2− x2 + 1

2=

5π2

32m2+O(m−3).

This completes the proof. �

The previous lemma shows that the m−2 dependence on the number of iterations in upper bounds forrelative error is in fact necessary. We now present a lower bound of the form lnn/(m2 ln lnn), which is, inessence, a combination of the two previous lower bounds. However, due to the distribution of eigenvaluesrequired to produce this bound, Gaussian quadrature is no longer exact, and we must make use of estimatesfor basic composite quadrature. We recall the following basic proposition.

Page 13: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 13

Proposition 3.8. If f ∈ C1[a, b], then for a = x0 < ... < xn = b,∣∣∣∣∣∫ b

a

f(x) dx−n∑i=1

(xi − xi−1)f(x∗i )

∣∣∣∣∣ ≤ b− a2

maxx∈[a,b]

|f ′(x)| maxi=1,...,n

(xi − xi−1),

where x∗i ∈ [xi−1, xi], i = 1, ..., n.

However, often we will be using quadrature for functions that, as a result of our eigenvalue distribution,do not belong to f ∈ C1[0, 1]. In order to avoid this issue, we will make use of monotonicity properties, whichrequires estimating the largest zero of Jacobi polynomials. We present the following proposition, which isa restatement of an upper bound for zeros of Gegenbauer polynomials, combined with a fairly standardinterlacing theorem.

Proposition 3.9. Let α ≥ β > −11/12. Then the largest zero x1 of Jα,βk (x), k > 1, is bounded above by

x1 ≤

√1−

(α+ 3/2

k + α+ 1/2

)2

.

Proof. By [11, Lemma 3.5], the largest zero x1 of the kth degree Gegenbauer polynomial (i.e., Jλ−1/2,λ−1/2k (x)),

with λ > −5/12, satisfies

x21 ≤ (k − 1)(k + 2λ+ 1)

(k + λ)2 + 3λ+ 5/4 + 3(λ+ 1/2)2/(k − 1)

≤ (k − 1)(k + 2λ+ 1)

(k + λ)2= 1−

(λ+ 1

k + λ

)2

.

By [5, Theorem 2.1], the largest zero of Jα,β+tk (x) is strictly greater than the largest zero of Jα,βk (x) for any

t > 0. As α ≥ β, combining these two facts provides our desired result. �

We are now prepared to prove the following lower bound.

Lemma 3.10. If m = O (lnn), then

supA∈Sn++

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)≥ .249 lnn

m2 ln lnn

]≥ 1− o(1/n),

and, therefore,

supA∈Sn++

Eb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)

]≥ (1− o(1))

.249 lnn

m2 ln lnn.

Proof. The structure of the proof is similar in concept to that of Lemma 3.7, but requires much more work.As in the proof of Lemma 3.7, we choose a matrix with eigenvalues based on a function corresponding tothe integral minimization problem of Lemma 3.6, and show that a large subset of [0,∞)n (with respect tofY (y)) satisfies conditions that allows us to lower bound the relative error using the aforementioned integralminimization problem. The main difficulty faced in this proof can be summarized as follows. In the proof ofLemma 3.7, we made use of Lemma 3.6, with ` = 1. However, to obtain the improved bounds of this lemma,we use ` > 1, which requires quadrature for functions that are no longer polynomials, and, therefore, cannotbe represented exactly using Gaussian quadrature. In addition, the required quadrature is for functionswhose derivative has a singularity at x = 1, and so Proposition 3.8 is not immediately applicable. Instead,we perform a two part error analysis. In particular, if a function is C1[0, α], α < 1, and monotonic on [α, 1],then using Proposition 3.8 on [0, α] and a monotonicity argument on [α, 1] results in a bound for the errordue to quadrature.

Let ` =

⌊.2495 lnn

ln lnn

⌋, k =

⌊m4.004`

⌋, and m = O(lnn). Consider a matrix A ∈ Sn++ with eigenvalues

given by f(xj), j = 1, ..., k, each with multiplicity either bn/kc or dn/ke, where f(x) = 1 − (1 − x)1` , and

Page 14: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

14 JOHN C. URSCHEL

xj = j/k, j = 1, ..., k. Then

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)≥ ε

]= P

Yi∼χ21

minP∈Pm−1

P 6=0

∑ni=2 YiP

2 (λi) (1− λi)∑ni=1 YiP

2 (λi)≥ ε

= P

Yi∼χ21

minP∈Pm−1

P 6=0

∑kj=1 YjP

2 (f(xj)) (1− f(xj))∑kj=1 YjP

2 (f(xj))≥ ε

,where Y1, ..., Yn ∼ χ2

1, and Yj is the sum of the Yi’s that satisfy λi = f(xj). Each Yj , j = 1, ..., k, has either

bn/kc or dn/ke degrees of freedom. Because m = O(lnn), we have k = o(n.999) and, by Proposition 3.3,

P[.999bn/kc ≤ Yj ≤ 1.001dn/ke, j = 1, ..., k

]≥(

1−(.999e.001

)bn/kc − (1.001e−.001)dn/ke)k

= 1− o(1/n).

Therefore, with probability 1− o(1/n),

minP∈Pm−1

P 6=0

∑kj=1 YjP

2 (f(xj)) (1− f(xj))∑kj=1 YjP

2 (f(xj))≥ .998 min

P∈Pm−1

P 6=0

∑kj=1 P

2(f(xj))(1− f(xj))∑kj=1 P

2(f(xj)).

Let P (y) =

m−1∑r=0

arJ`,0r (2y − 1), ar ∈ R, r = 0, ...,m− 1, and define

gr,s(x) = J`,0r (2f(x)− 1)J`,0s (2f(x)− 1),

gr,s(x) = J`,0r (2f(x)− 1)J`,0s (2f(x)− 1)(1− f(x)),

r, s = 0, ...,m− 1. Then we have∑kj=1 P

2(f(xj))(1− f(xj))∑kj=1 P

2(f(xj))=

∑m−1r,s=0 aras

∑kj=1 gr,s(xj)∑m−1

r,s=0 aras∑kj=1 gr,s(xj)

.

We now replace each quadrature∑kj=1 gr,s(xj) or

∑kj=1 gr,s(xj) in the previous expression by its correspond-

ing integral, plus a small error term. However, the functions gr,s and gr,s are not elements of C1[0, 1], asf ′(x) has a singularity at x = 1, and, therefore, we cannot use Proposition 3.8 directly. Instead we break theerror analysis of the quadrature into two components. We have gr,s, gr,s ∈ C1[0, α] for any 0 < α < 1, and,if α is chosen to be large enough, both gr,s and gr,s will be monotonic on the interval [α, 1]. In that case,we can apply Proposition 3.8 to bound the error over the interval [0, α] and use basic properties of Riemannsums of monotonic functions to bound the error over the interval [α, 1].

The function f(x) is monotonically increasing on [0, 1], and, by equation (iv) of Proposition 3.5 andProposition 3.9, the function J`,0r (z), r = 2, ...,m− 1, is monotonically increasing on the interval

(11)

√1−(

`+ 5/2

m+ `− 1/2

)2

, 1

.The functions J`,00 (z) = 1 and J`,01 (z) = ` + 1 + (` + 2)(z − 1)/2 are clearly monotonically non-decreasing

over interval (11) as well. Using the identity√

1− y ≤ 1 − y/2, y ∈ [0, 1], the function J`,0r (2f(x) − 1),r = 0, ...,m− 1, is monotonically non-decreasing on the interval

(12)

[1−

(`+ 5/2

2m+ 2`− 1

)2`

, 1

].

Therefore, the functions gr,s(x) are monotonically non-decreasing and gr,s(x) are monotonically non-increasing

on the interval (12) for all r, s = 0, ...,m− 1. The term(

`+5/22m+2`−1

)2`

= ω(1/k), and so there exists an index

j∗ ∈ {1, ..., k} such that

1−(

`+ 5/2

2m+ 2`− 1

)2`

≤ xj∗ < 1−(

`+ 5/2

2m+ 2`− 1

)2`

+1

k< 1.

Page 15: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 15

We now upper bound the derivatives of gr,s and gr,s on the interval [0, xj∗ ]. By equation (iv) of Proposition3.5, the first derivatives of gr,s and gr,s are given by

g′r,s(x) = 2f ′(x)

([d

dzJ`,0r (z)

]z=2f(x)−1

J`,0s (2f(x)− 1) + J`,0r (2f(x)− 1)

[d

dzJ`,0s (z)

]z=2f(x)−1

)

=(1− x)−

`−1`

`

((r + `+ 1)J`+1,1

r−1 (2f(x)− 1)J`,0s (2f(x)− 1)

+ (s+ `+ 1)J`,0r (2f(x)− 1)J`+1,1s−1 (2f(x)− 1)

),

and

g′r,s(x) = g′r,s(x)(1− x)1` − gr,s(x)

(1− x)−`−1`

`.

By Proposition 3.5, and the identity

(x

y

)≤(ex

y

)y, x, y ∈ N, y ≤ x,

maxx∈[0,xj∗ ]

|g′r,s(x)| ≤ (1− xj∗)−`−1`

`

((r + `+ 1)

(r + `

`+ 1

)(s+ `

`

)+ (s+ `+ 1)

(r + `

`

)(s+ `

`+ 1

))

≤ 2m+ `

`

((`+ 5/2

2m+ 2`− 1

)2`

− 1

k

)− `−1` (

e(m+ `− 1)

`

)2`+1

,

and

maxx∈[0,xj∗ ]

∣∣g′r,s(x)∣∣ ≤ max

x∈[0,xj∗ ]|g′r,s(x)|+ (1− xj∗)−

`−1`

`

(r + `

`

)(s+ `

`

)

≤ 3m+ `

`

((`+ 5/2

2m+ 2`− 1

)2`

− 1

k

)− `−1` (

e(m+ `− 1)

`

)2`+1

.

Therefore, both maxx∈[0,xj∗ ]

|g′r,s(x)| and maxx∈[0,xj∗ ]

|g′r,s(x)| are o(m4.002`). Then, by Propositions 3.5 and 3.8, and

monotonicity on the interval [xj∗ , 1], we have∣∣∣∣∣∣1kk∑j=1

gr,s(xj)−∫ 1

0

gr,s(y) dy

∣∣∣∣∣∣ ≤∣∣∣∣∣∣1k

j∗∑j=1

gr,s(xj)−∫ xj∗

0

gr,s(y) dy

∣∣∣∣∣∣+

∣∣∣∣∣∣1kk∑

j=j∗+1

gr,s(xj)−∫ 1

α

gr,s(y) dy

∣∣∣∣∣∣≤ 1 + o(1)

2km4.002`+gr,s(1)

k

≤ 1 + o(1)

2m.002`+

1

k

(e(m+ `+ 1)

`+ 1

)2(`+1)

≤ 1 + o(1)

2m.002`,

and, similarly,∣∣∣∣∣∣1kk∑j=1

gr,s(xj)−∫ 1

0

gr,s(y) dy

∣∣∣∣∣∣ ≤∣∣∣∣∣∣1k

j∗∑j=1

gr,s(xj)−∫ xj∗

0

gr,s(y) dy

∣∣∣∣∣∣+

∣∣∣∣∣∣1kk∑

j=j∗+1

gr,s(xj)−∫ 1

xj∗

gr,s(y) dy

∣∣∣∣∣∣≤ 1 + o(1)

2m.002`+gr,s(xj∗)

k

≤ 1 + o(1)

2m.002`+gr,s(1)

k

≤ 1 + o(1)

2m.002`.

Page 16: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

16 JOHN C. URSCHEL

Let us denote this upper bound by M = (1 + o(1))/(2m.002`). By using the substitution y = 1 − (1 − z)`,followed by a linear change of variable, we have∫ 1

0

gr,s(y) dy = `

∫ 1

0

J`,0r (2z − 1)J`,0s (2z − 1)(1− z)` dz

=`

2`+1〈J`,0r , J`,0s 〉`,0

and, similarly, ∫ 1

0

gr,s(y) dy = `

∫ 1

0

J`,0r (2z − 1)J`,0s (2z − 1)(1− z)`−1 dz

=`

2`〈J`,0r , J`,0s 〉`−1,0.

Then

maxP∈Pm−1

P 6=0

∑kj=1 P

2(f(xj))∑kj=1 P

2(f(xj))(1− f(xj))≤ max

a∈Rma6=0

|εr,s|≤M|εr,s|≤M

∑m−1r,s=0 aras

[`2`〈J`,0r , J`,0s 〉`−1,0 + εr,s

]∑m−1r,s=0 aras

[`

2`+1 〈J`,0r , J`,0s 〉`,0 + εr,s

] .

Letting ar = ar‖J`,0r ‖`,0, r = 0, ...,m−1, and noting that, by Proposition 3.5, ‖J`,0r ‖`,0 =

(2`+1

2r + `+ 1

)1/2

,

we obtain the bound

maxP∈Pm−1

P 6=0

∑kj=1 P

2(f(xj))∑kj=1 P

2(f(xj))(1− f(xj))≤ 2 max

a∈Rma6=0

|εr,s|≤M|εr,s|≤M

∑m−1r,s=0 aras

[〈J`,0r ,J`,0s 〉`−1,0

‖J`,0r ‖`,0‖J`,0s ‖`,0+ (2r+`+1)1/2(2s+`+1)1/2

2` εr,s

]∑m−1r=0 a2

r +∑m−1r,s=0 aras

(2r+`+1)1/2(2s+`+1)1/2

` εr,s.

Let B ∈ Rm×m satisfy B(r, s) = 〈J`,0r , J`,0s 〉`,0‖J`,0r ‖−1`,0‖J

`,0s ‖−1

`,0 , r, s = 0, ...,m − 1. From the proof ofLemma 3.6, we have that

maxa∈Rma 6=0

aTBa

aT a≤ m(m+ `)

2`.

This implies that

maxP∈Pm−1

P 6=0

∑kj=1 P

2(f(xj))∑kj=1 P

2(f(xj))(1− f(xj))≤ 2 max

a∈Rma6=0

aT(B + ε11T

)a

aT (I − ε11T ) a

≤ 2

1− εmmaxa∈Rma6=0

[aTBa

aT a

]+

εm

1− εm

≤ 1

1− εmm(m+ `)

`+

εm

1− εm,

where

ε =M(2m+ `− 1)

`= (1 + o(1))

2m+ `− 1

2m.002``= o(1/m).

This implies that, with probability 1− o(1/n),

minP∈Pm−1

P 6=0

∑kj=1 YjP

2 (f(xj)) (1− f(xj))∑kj=1 YjP

2 (f(xj))≥ (1− o(1))

.998`

m(m+ `)≥ .249 lnn

m2 ln lnn.

This completes the proof. �

In the previous proof, a number of very small constants were used, exclusively for the purpose of obtainingan estimate with constant as close to 1/4 as possible. These very small constants can be replaced by morepractical numbers (that begin to exhibit asymptotic convergence for reasonably sized n), at the cost of aworse constant (say, 1/10 instead of .249). This completes the analysis of asymptotic lower bounds. In the

Page 17: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 17

following section, we produce improved upper bounds. The lower bounds of this section, paired with theupper bounds in the following section, will complete the proofs of Theorems 1.2 and 1.3, the main results ofthe paper.

4. Improved Upper Bounds

To complement the asymptotic lower bounds of the previous section, we prove upper bounds for relativeerror. Our estimate for relative error in the one-norm is of the same order as the estimate (4), but withan improved constant. However, our technique is much more robust, and results in significantly improvedestimates over (6) for all p-norms. Our technique for obtaining these estimates differs from the techniqueof [9] in one key way. Rather than integrating first by b1 and using properties of the arctan function, wereplace the ball Bn by n chi-square random variables on [0,∞)n, and iteratively apply Cauchy-Schwarz toour relative error until we obtain an exponent c for which the inverse chi-square distribution with one degreeof freedom has a convergent cth moment. We prove the following lemma.

Lemma 4.1. Let n ≥ 100, m ≥ 10, and p ≥ 1. Then

maxA∈Sn++

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)

)p]1/p

≤ .068ln2(n(m− 1)8p

)(m− 1)2

.

Proof. By repeated application of Cauchy Schwarz,∑ni=1 yiQ

2(λi)(1− λi)∑ni=1 yiQ

2(λi)≤[∑n

i=1 yiQ2(λi)(1− λi)2∑n

i=1 yiQ2(λi)

] 12

≤[∑n

i=1 yiQ2(λi)(1− λi)2q∑n

i=1 yiQ2(λi)

] 12q

for q ∈ N. Choosing q to satisfy 2p < 2q ≤ 4p, and using equation (8) with polynomial normalizationQ(1) = 1, we have

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)

)p] 1p

≤ minQ∈Pm−1(1)

[∫[0,∞)n

(∑ni=2 yiQ

2(λi)(1− λi)2q∑ni=1 yiQ

2(λi)

) p2q

fY (y) dy

] 1p

≤ minQ∈Pm−1(1)

∫[0,∞)n

(∑i:λi<β

yiQ2(λi)(1− λi)2q∑n

i=1 yiQ2(λi)

) p2q

fY (y) dy

1p

+

∫[0,∞)n

(∑i:λi≥β yiQ

2(λi)(1− λi)2q∑ni=1 yiQ

2(λi)

) p2q

fY (y) dy

1p

for any β ∈ (0, 1). We treat the two terms on the right hand side separately, beginning with the second. Thesecond term is always bounded above by∫

[0,∞)n

(∑i:λi≥β yiQ

2(λi)(1− λi)2q∑ni=1 yiQ

2(λi)

) p2q

fY (y) dy

1p

≤ 1− β.

The integrand of the first term satisfies(∑i:λi<β

yiQ2(λi)(1− λi)2q∑n

i=1 yiQ2(λi)

) p2q

(∑i:λi<β

yiQ2(λi)(1− λi)2q∑n

i=1 yiQ2(λi)

) 14

≤ maxx∈[0,β]

|Q(x)|1/2(1− x)2q−2

(∑i:λi<β

yi

y1

) 14

.

We replace the minimizing polynomial in Pm−1(1) by Q(x) =Tm−1

(2βx− 1

)Tm−1

(2β − 1

) , where Tm−1(·) is the Cheby-

shev polynomial of the first kind. The Chebyshev polynomials Tm−1(·) are bounded by one in magnitude on

Page 18: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

18 JOHN C. URSCHEL

the interval [−1, 1], and this bound is tight at the endpoints. Therefore, our maximum is achieved at x = 0,and

maxx∈[0,β]

|Q(x)|1/2(1− x)2q−2

=

∣∣∣∣Tm−1

(2

β− 1

)∣∣∣∣−1/2

.

By the definition Tm−1(x) = 1/2

((x−

√x2 − 1

)m−1

+(x+

√x2 − 1

)m−1)

, |x| ≥ 1, and the standard

identity e2x ≤ 1 + x

1− x, x ∈ [0, 1]

Tm−1

(2

β− 1

)≥ 1

2

2

β− 1 +

√(2

β− 1

)2

− 1

m−1

=1

2

(1 +√

1− β1−√

1− β

)m−1

≥ 1

2exp

{2√

1− β (m− 1)}.

In addition, ∫[0,∞)n

(∑ni=2 yiy1

)1/4

fY (y) dy =Γ (n/2− 1/4) Γ(1/4)

Γ(n/2− 1/2)Γ(1/2)≤ Γ(1/4)

21/4Γ(1/2)n1/4,

which gives us

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)

)p]1/p

≤[

21/4Γ(1/4)

Γ(1/2)n1/4

]1/p

e−γ(m−1)/p + γ2,

where γ =√

1− β. Setting γ =p

m− 1ln(n1/4p(m− 1)2

)(assuming γ < 1, otherwise our bound is already

greater than one, and trivially holds), we obtain

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)

)p]1/p

(21/4Γ(1/4)

Γ(1/2)

)1/p

+ p2 ln2(n1/4p(m− 1)2)

(m− 1)2

=

(21/4Γ(1/4)

Γ(1/2)

)1/p

+ 116 ln2

(n(m− 1)8p

)(m− 1)2

≤ .068ln2(n(m− 1)8p

)(m− 1)2

,

for m ≥ 10, n ≥ 100. This completes the proof.�

A very similar proof, paired with probabilistic bounds on the quantity∑ni=2 Yi/Y1, where Y1, ..., Yn ∼ χ2

1,gives the following probabilistic estimate.

Lemma 4.2.

supA∈Sn++

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)≤ .571

ln2(n(m− 1)2/3

)(m− 1)2

]≥ 1− o(1/n).

Page 19: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 19

Proof. By the proof of Lemma 4.1, we have

λ1(A)− λ(m)1 (A, b)

λ1(A)= min

Q∈Pm−1(1)

∑ni=2 YiQ

2(λi)(1− λi)∑ni=1 YiQ

2(λi)

≤ minQ∈Pm−1(1)

maxx∈[0,β]

Q2(x)(1− x)

∑ni=2 YiY1

+ (1− β)

≤ T−2m−1

(2

β− 1

) ∑ni=2 YiY1

+ (1− β)

≤ 4 exp{−4√

1− β(m− 1)}∑ni=2 YiY1

+ (1− β).

By Proposition 3.3,

P[∑n

i=2 YiY1

≥ n3.02

]≤ P

[Y1 ≤ n−2.01

]+ P

[n∑i=2

Yi ≥ n1.01

]

≤(e/n2.01

)1/2+(n.01e1−n.01

)(n−1)/2

= o(1/n).

Let√

1− β =ln(n3.02(m− 1)2

)4(m− 1)

. Then, with probability 1− o(1/n),

λ1(A)− λ(m)1 (A)

λ1(A)≤ 4

(m− 1)2+

ln2(n3.02(m− 1)2

)16(m− 1)2

.

The 4/(m−1)2 term is dominated by the log term as n increases, and, therefore, with probability 1−o(1/n),

λ1(A)− λ(m)1 (A)

λ1(A)≤ .571

ln2(n(m− 1)2/3

)(m− 1)2

.

Combining the lower bounds in Lemmas 3.4, 3.7, 3.10, the upper bounds in Lemmas 4.1 and 4.2, and theobservations from Section 2 regarding the extension from Sn++ to Sn, completes the proofs of Theorems 1.2and 1.3, the main results of the paper.

5. Distribution Dependent Bounds

In this section, we consider improved estimates for matrices with certain specific properties. First, weshow that if a matrix A has a reasonable number of eigenvalues near its spectral radius, then we can producean error estimate that no longer depends on the dimension of the matrix, and depends only on the number ofiterations m. In particular, we suppose that the eigenvalues of our matrix A are such that, once scaled, thereare at least n/(m−1)α eigenvalues in the range [β, 1], for a specific value of β satisfying 1−β = O

(m−2 ln2m

).

For a large number of matrices for which the Lanczos method is used, this assumption holds true. Underthis assumption, we prove the following error estimate for A.

Theorem 5.1. Let A ∈ Sn, m ≥ 10, p ≥ 1, α > 0, and n ≥ m(m− 1)α. If

#

{λi(A)

∣∣∣∣ λ1(A)− λi(A)

λ1(A)− λn(A)≤(

(2p+ α/4) ln(m− 1)

m− 1

)2}≥ n

(m− 1)α,

then

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)

)p]1/p

≤ .077(2p+ α/4)2 ln2(m− 1)

(m− 1)2.

In addition, if

#

{λi(A)

∣∣∣∣ λ1(A)− λi(A)

λ1(A)− λn(A)≤(

(α+ 2) ln(m− 1)

4(m− 1)

)2}≥ n

(m− 1)α,

Page 20: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

20 JOHN C. URSCHEL

then

Pb∼U(Sn−1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)≤ .126

(α+ 2)2 ln2(m− 1)

(m− 1)2

]≥ 1−O(e−m).

Proof. We begin by bounding expected relative error. The main idea is to proceed as in the proof of Lemma4.1, but take advantage of the number of eigenvalues near λ1. For simplicity, let λ1 = 1 and λn = 0. Weconsider eigenvalues in the ranges [0, 2β − 1) and [2β − 1, 1] separately, 1/2 < β < 1, and then make use ofthe lower bound for the number of eigenvalues in [β, 1]. From the proof of Lemma 4.1, we have

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)

)p] 1p

≤ minQ∈Pm−1(1)

∫[0,∞)n

(∑i:λi<2β−1 yiQ

2(λi)(1− λi)2q∑ni=1 yiQ

2(λi)

) p2q

fY (y) dy

1p

+

∫[0,∞)n

(∑i:λi≥2β−1 yiQ

2(λi)(1− λi)2q∑ni=1 yiQ

2(λi)

) p2q

fY (y) dy

1p

,

where q ∈ N, 2p < 2q ≤ 4p, and β ∈ (1/2, 1). The second term is at most 2(1− β), and the integrand of thefirst term is bounded above by

(∑i:λi<2β−1 yiQ

2(λi)(1− λi)2q∑ni=1 yiQ

2(λi)

)p/2q≤

(∑i:λi<2β−1 yiQ

2(λi)(1− λi)2q∑ni=1 yiQ

2(λi)

)1/4

(∑i:λi<2β−1 yiQ

2(λi)(1− λi)2q∑i:λi≥β yiQ

2(λi)

)1/4

≤maxx∈[0,2β−1] |Q(x)|1/2 (1− x)2q−2

minx∈[β,1] |Q(x)|1/2

(∑i:λi<2β−1 yi∑i:λi≥β yi

)1/4

.

Let√

1− β =(2p+ α/4) ln(m− 1)

m− 1< 1/4 (if β ≤ 1/2, then our estimate is a trivial one, and we are already

done). By the condition of the theorem, there are at least n/(m − 1)α eigenvalues in the interval [β, 1].Therefore,

∫[0,∞)n

(∑i:λi<2β−1 yi∑i:λi≥β yi

)1/4

fY (y) dy ≤ EY∼χ2

n

[Y 1/4

]E

Y∼χ2dn/(m−1)αe

[Y −1/4

]=

Γ(n/2 + 1/4)Γ(dn/(m− 1)αe/2− 1/4)

Γ(n/2)Γ(dn/(m− 1)αe/2)

≤ Γ(m(m− 1)α/2 + 1/4)Γ(m/2− 1/4)

Γ(m(m− 1)α/2)Γ(m/2)

≤ 1.04 (m− 1)α/4

for n ≥ m(m − 1)α and m ≥ 10. Replacing the minimizing polynomial by Q(x) =Tm−1

(2x

2β−1 − 1)

Tm−1

(2

2β−1 − 1) , we

obtain

maxx∈[0,2β−1]

∣∣∣Q(x)∣∣∣1/2 (1− x)2q−2

minx∈[β,1]

∣∣∣Q(x)∣∣∣1/2 =

1

T1/2m−1

(2β

2β−1 − 1) ≤ 1

T1/2m−1

(2β − 1

) ≤ √2

exp{√

1− β(m− 1)}.

Page 21: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 21

Combining our estimates results in the bound

Eb∼U(Sn−1)

[(λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)

)p] 1p

≤(

1.04√

2(m− 1)α/4)1/p

e−√

1−β(m−1)/p + 2(1− β)

=(1.04

√2)1/p

(m− 1)2+

(2p+ α/4)2 ln2(m− 1)

(m− 1)2

≤ .077(2p+ α/4)2 ln2(m− 1)

(m− 1)2

for m ≥ 10, p ≥ 1, and α > 0. This completes the bound for expected relative error.We now produce a probabilistic bound for relative error in a similar fashion to the proof of Lemma 4.2.

Let√

1− β =(α+ 2) ln(m− 1)

4(m− 1). We have

λ1(A)− λ(m)1 (A, b)

λ1(A)− λn(A)= min

Q∈Pm−1(1)

∑ni=2 YiQ

2(λi)(1− λi)∑ni=1 YiQ

2(λi)

≤ minQ∈Pm−1(1)

maxx∈[0,2β−1]Q2(x)(1− x)

minx∈[β,1]Q2(x)

∑i:λi<2β−1 Yi∑i:λi≥β Yi

+ 2(1− β)

≤ T−2m−1

(2

β− 1

) ∑i:λi<2β−1 Yi∑i:λi≥β Yi

+ 2(1− β)

≤ 4 exp{−4√

1− β(m− 1)}∑i:λi<2β−1 Yi∑i:λi≥β Yi

+ 2(1− β).

By Proposition 3.3,

P

[∑i:λi<2β−1 Yi∑i:λi≥β Yi

≥ 4(m− 1)α

]≤ P

∑i:λi<2β−1

Yi ≥ 2n

+ P

∑i:λi≥β

Yi ≤n

2(m− 1)α

≤ (2/e)

n/2+(√e/2)n/2(m−1)α

= O(e−m).

Then, with probability 1−O(e−m),

λ1(A)− λ(m)1 (A)

λ1(A)− λn(A)≤ 16(m− 1)αe−4

√1−β(m−1) + 2(1− β)

=16

(m− 1)2+

(α+ 2)2 ln2(m− 1)

8(m− 1)2.

The 16/(m − 1)2 term is dominated by the log term as m increases, and, therefore, with probability 1 −O(e−m),

λ1(A)− λ(m)1 (A)

λ1(A)≤ .126

(α+ 2)2 ln2(m− 1)

(m− 1)2.

The above theorem shows that, for matrices whose distribution of eigenvalues is independent of n, we canobtain dimension-free estimates. For example, the above theorem holds for the matrix A from Example 1.1,for α = 2.

When a matrix A has eigenvalues known to converge to a limiting distribution as dimension increases, or arandom matrixXn exhibits suitable convergence of its empirical spectral distribution LXn := 1/n

∑ni=1 δλi(Xn),

improved estimates can be obtained by simply estimating the corresponding integral polynomial minimiza-tion problem. However, to do so, we first require a law of large numbers for weighted sums of independentidentically distributed (i.i.d.) random variables. We recall the following result.

Proposition 5.2. ([1]) Let a1, a2, ... ∈ [a, b] and X1, X2, ... be i.i.d. random variables, with E[X1] = 0 andE[X2

1 ] <∞. Then 1n

∑ni=1 aiXi → 0 almost surely.

Page 22: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

22 JOHN C. URSCHEL

We are present the following theorem regarding the error of random matrices that exhibit suitable con-vergence.

Theorem 5.3. Let Xn ∈ Sn, n = 1, 2, ... be a sequence of random matrices, such that LXn converges inprobability to σ(x) dx, where σ(x) ∈ C([a, b]), a, b ∈ supp(σ). Then, for all m ∈ N and ε > 0,

limn→∞

P

∣∣∣∣∣∣∣∣∣λ1(Xn)− λ(m)

1 (Xn, b)

λ1(Xn)− λn(Xn)− minP∈Pm−1

P 6=0

∫ b

a

P 2(x)

(b− xb− a

)σ(x) dx∫ b

a

P 2(x)σ(x) dx

∣∣∣∣∣∣∣∣∣ > ε

= 0.

Proof. The main idea of the proof is to use Proposition 5.2 to control the behavior of Y , and the convergenceof LXn to σ(x) dx to show convergence to our integral minimization problem. We first write our polynomial

P ∈ Pm−1 as P (x) =∑m−1j=0 αjx

j and our unnormalized error as

λ1(Xn)− λ(m)1 (Xn, b) = min

P∈Pm−1

P 6=0

∑ni=2 YiP

2(λi)(λ1 − λi)∑ni=1 YiP

2(λi)

= minα∈Rmα6=0

∑m−1j1,j2=0 αj1αj2

∑ni=2 Yiλ

j1+j2i (λ1 − λi)∑m−1

j1,j2=0 αj1αj2∑ni=1 Yiλ

j1+j2i

,

where Y1, ..., Yn are i.i.d. chi-square random variables with one degree of freedom each. The functions xj ,j = 0, ..., 2m− 2, are bounded on [a, b], and so, by Proposition 5.2, for any ε1, ε2 > 0,∣∣∣∣∣ 1n

n∑i=2

Yiλji (λ1 − λi)−

1

n

n∑i=2

λji (λ1 − λi)

∣∣∣∣∣ < ε1, j = 0, ..., 2m− 2,

and ∣∣∣∣∣ 1nn∑i=1

Yiλji −

1

n

n∑i=1

λji

∣∣∣∣∣ < ε2, j = 0, ..., 2m− 2,

with probability 1− o(1). LXn converges in probability to σ(x) dx, and so, for any ε3, ε4 > 0,∣∣∣∣∣ 1nn∑i=2

λji (λ1 − λi)−∫ b

a

xj(b− x)σ(x) dx

∣∣∣∣∣ < ε3, j = 0, ..., 2m− 2,

and ∣∣∣∣∣ 1nn∑i=1

λji −∫ b

a

xjσ(x) dx

∣∣∣∣∣ < ε4, j = 0, ..., 2m− 2,

with probability 1− o(1). This implies that

λ1(Xn)− λ(m)1 (Xn, b) = min

α∈Rmα 6=0

∑m−1j1,j2=0 αj1αj2

[∫ b

a

xj1+j2(b− x)σ(x) dx+ E(j1, j2)

]∑m−1j1,j2=0 αj1αj2

[∫ b

a

xj1+j2σ(x) dx+ E(j1, j2)

] ,

where |E(j1, j2)| < ε1 + ε3 and |E(j1, j2)| < ε2 + ε4, j1, j2 = 0, ...,m− 1, with probability 1− o(1).The minimization problem

minP∈Pm−1

P 6=0

∫ b

a

P 2(x)

(b− xb− a

)σ(x) dx∫ b

a

P 2(x)σ(x) dx

corresponds to a generalized Rayleigh quotientαTAα

αTBα, where A,B ∈ Sm++ and λmax(A), λmax(B), and

λmin(B) are all constants independent of n, and α = (α0, ..., αm−1)T . By choosing ε1, ε2, ε3, ε4 sufficiently

Page 23: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 23

small, ∣∣∣∣∣αT (A+ E)α

αT (B + E)α− αTAα

αTBα

∣∣∣∣∣ =

∣∣∣∣∣ (αT Eα)αTBα− (αTEα)αTAα

(αTBα+ αTEα)αTBα

∣∣∣∣∣≤ (ε1 + ε3)mλmax(B) + (ε2 + ε4)mλmax(A)

(λmin(B)− (ε2 + ε4)m)λmin(B)≤ ε

for all α ∈ Rm with probability 1− o(1). This completes the proof. �

The above theorem is a powerful tool for explicitly computing the error in the Lanczos method for certaintypes of matrices, as the computation of the extremal eigenvalue of an m×m matrix is a nominal computationcompared to one application of an n dimensional matrix. Of course, Theorem 5.3 is equally applicable forestimating λn. As an example, we apply the above theorem to the class of Wigner matrices.

Example 5.4. A random matrix Xn = n−1/2Yn ∈ Sn is said to be a Wigner matrix with second momentt if Yn(i, j), 1 ≤ i < j ≤ n, are i.i.d. random variables with mean zero and variance t, and Yn(i, i),i = 1, ..., n, are i.i.d. random variables with mean zero and finite variance. The empirical spectral distribution

LXn converges in probability to the semi-circle distribution1

2πt

√|4t− x2|+ dx. By Theorem 5.3, the error

approaches

minP∈Pm−1

P 6=0

∫ 2√t

−2√t

P 2(x)(

2√t− x

)√4t− x2 dx

4√t

∫ 2√t

−2√t

P 2(x)√

4t− x2 dx

= minP∈Pm−1

P 6=0

∫ 1

−1

P 2(y)(1− y)3/2(1 + y)1/2 dy

2

∫ 1

−1

P 2(y)(1− y)1/2(1 + y)1/2 dy

.

The upper and lower bounds of Lemma 3.6 immediately imply that Wigner matrices exhibit an asymptotic

error in the range

[3

2m(m+ 2),

99√

11

16m(m+ 2)

].

Example 5.4 provides further evidence that the dependence of m−2 is not just a worst case bound for theLanczos method, but actually a precise estimate of typical error for a large number of matrices. The choiceof Wigner matrices in Example 5.4 is a fairly arbitrary one, as the above technique is applicable to a largeclass of matrices. In general, if σ(x) can be bounded above and below by some constants times a shiftedJacobi weight function (b − x)α−1(x − a)β , then, by Lemma 3.6, we can conclude that the correspondingmatrix exhibits an order m−2 convergence asymptotically. We state this formally in the following corollary.

Corollary 5.5. Let Xn ∈ Sn, n = 1, 2, ... be a sequence of random matrices, such that LXn converges inprobability to σ(x) dx, where σ(x) ∈ C([a, b]), and satisfies

C1(b− x)α−1(x− a)β ≤ σ(x) ≤ C2(b− x)α−1(x− a)β , x ∈ [a, b],

for some α ≥ 1, β ≥ 0, and C1, C2 > 0. Then, for all m ∈ N and ε > 0,

limn→∞

P

(∣∣∣∣∣λ1(Xn)− λ(m)1 (Xn, b)

λ1(Xn)− λn(Xn)− Cmm(m+ α+ β)

∣∣∣∣∣ > ε

)= 0,

for some constant

Cm ∈[C1α

C2,

3C2α (2 + α/2)α

C1

].

Proof. By the condition C1(b − x)α−1(x − a)β ≤ σ(x) ≤ C2(b − x)α−1(x − a)β and a suitable change ofvariable,

C1

2C2min

P∈Pm−1

P 6=0

‖P‖2α,β‖P‖2α−1,β

≤ minP∈Pm−1

P 6=0

∫ b

a

P 2(x)

(b− xb− a

)σ(x) dx∫ b

a

P 2(x)σ(x) dx

≤ C2

2C1min

P∈Pm−1

P 6=0

‖P‖2α,β‖P‖2α−1,β

.

The result follows immediately by applying Lemma 3.6 and Theorem 5.3. �

Page 24: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

24 JOHN C. URSCHEL

6. Estimates for Arbitrary Eigenvalues and Condition Number

Up to this point, we have concerned ourselves almost exclusively with the extremal eigenvalues λ1 andλn of a matrix. In this section, we extend the techniques of this paper to arbitrary eigenvalues, and alsoobtain bounds for the condition number of a positive definite matrix. The results of this section provide thefirst uniform error estimates for arbitrary eigenvalues and the first lower bounds for the relative error withrespect to the condition number. Lower bounds for arbitrary eigenvalues follow relatively quickly from ourprevious work. However, our proof technique for upper bounds requires some mild assumptions regardingthe eigenvalue gaps of the matrix. We begin with asymptotic lower bounds for an arbitrary eigenvalue, andpresent the following corollary of Theorems 1.2 and 1.3.

Corollary 6.1.

maxA∈Sn

Pb∼U(Sn−1)

[λi(A)− λ(m)

i (A, b)

λ1(A)− λn(A)≥ 1− o(1)

]≥ 1− o(1/n)

for m = o(lnn),

maxA∈Sn

Pb∼U(Sn−1)

[λi(A)− λ(m)

i (A, b)

λ1(A)− λn(A)≥ (1− o(1))(1− α)

]≥ 1− o(1/n)

for m ≤ lnn− ln(ln2 n)

4 ln(

1+√

2√α

) , α ∈ (0, 1],

maxA∈Sn

Pb∼U(Sn−1)

[λi(A)− λ(m)

i (A, b)

λ1(A)− λn(A)≥ .249 lnn

m2 ln lnn

]≥ 1− o(1/n)

for m = O(lnn),

maxA∈Sn

Pb∼U(Sn−1)

[λi(A)− λ(m)

i (A, b)

λ1(A)− λn(A)≥ .637

m2

]≥ 1− o(1/n)

for m = o

(n1/2

ln1/2 n

), and, therefore,

maxA∈Sn

Eb∼U(Sn−1)

[λi(A)− λ(m)

i (A, b)

λ1(A)− λn(A)

]≥ (1− o(1))×

1 for m = o(lnn)

1− α for m ≤ lnn− ln(ln2 n)

4 ln(

1+√

2√α

) , α ∈ (0, 1]

.249 lnn

m2 ln lnnfor m = O(lnn)

.637

m2for m = o

(n1/2

ln1/2 n

).

Proof. Let ϕ1, ..., ϕn be the eigenvectors corresponding to λ1(A) ≥ ... ≥ λn(A). By the inequalities

λ(m)i (A, b) ≤ max

x∈Km(A,b)x 6=0

xTϕj=0,j=1,...,i−1

xTAx

xTx≤ maxx∈Km(A,b−

∑i−1j=1(ϕTj b)b)

x 6=0

xTAx

xTx= λ

(m)1

(A, b−

∑i−1j=1(ϕTj b)b

),

the relative error

λi(A)− λ(m)i (A, b)

λ1(A)− λn(A)≥λi(A)− λ(m)

1 (A, b−∑i−1j=1(ϕTj b)b)

λ1(A)− λn(A).

The right hand side corresponds to an extremal eigenvalue problem of dimension n − i + 1. Setting thelargest eigenvalue of A to have multiplicity i, and defining the eigenvalues λi, ..., λn based on the eigenvalues(corresponding to dimension n − i + 1) used in the proofs of Lemmas 3.4, 3.7, and 3.10 completes theproof. �

Next, we provide an upper bound for the relative error in approximating λi under the assumption ofnon-zero gaps between eigenvalues λ1, ..., λi. We present the following theorem.

Page 25: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 25

Theorem 6.2. Let n ≥ 100, m ≥ 9 + i, p ≥ 1, and A ∈ Sn. Then

Eb∼U(Sn−1)

[(λi(A)− λ(m)

i (A, b)

λi(A)− λn(A)

)p]1/p

≤ .068ln2(δ−2(i−1)n(m− i)8p

)(m− i)2

and

Pb∼U(Sn−1)

[λi(A)− λ(m)

i (A, b)

λi(A)− λn(A)≤ .571

ln2(δ−2(i−1)/3n(m− i)2/3

)(m− i)2

]≥ 1− o(1/n),

where

δ =1

2min

k=2,...,i

λk−1(A)− λk(A)

λ1(A)− λn(A).

Proof. As in previous cases, it suffices to prove the theorem for matrices A with λi(A) = 1 and λn(A) = 0

(if λi = λn, we are done). Similar to the polynomial representation of λ(m)1 (A, b), the Ritz value λ

(m)i (A, b)

corresponds to finding the polynomial in Pm−1 with zeros λ(m)k , k = 1, ..., i − 1, that maximizes the corre-

sponding Rayleigh quotient. For the sake of brevity, let φi(x) =

i−1∏k=1

(λ(m)k − x)2. Then λ

(m)i (A, b) can be

written as

λ(m)i (A, b) = max

P∈Pm−i

∑nj=1 b

2jP

2(λj)φi(λj)λj∑nj=1 b

2jP

2(λj)φi(λj),

and, therefore, the error is bounded above by

λi(A)− λ(m)i (A, b) ≤ max

P∈Pm−i

∑nj=i+1 b

2jP

2(λj)φi(λj)(1− λj)∑nj=1 b

2jP

2(λj)φi(λj).

The main idea of the proof is very similar to that of Lemma 4.1 and 4.2, paired with a pigeonhole principle.The intervals [λj(A)− δλ1, λj(A) + δλ1], j = 1, ..., i, are disjoint, and so there exists some index j∗ such that

the corresponding interval does not contain any of the Ritz values λ(m)k (A, b), k = 1, ..., i− 1.

We begin by bounding expected relative error. As in the proof of Lemma 4.1, by integrating over chi-squarerandom variables and using Cauchy-Schwarz, we have

Eb∼U(Sn−1)

[(λi(A)− λ(m)

i (A, b))p] 1

p

≤ minP∈Pm−i

∫[0,∞)n

(∑nj=i+1 yjP

2(λj)φi(λj)(1− λj)2q∑nj=1 yjP

2(λj)φi(λj)

) p2q

fY (y) dy

1p

≤ minP∈Pm−i

∫[0,∞)n

(∑j:λj<β

yjP2(λj)φi(λj)(1− λj)2q∑n

j=1 yjP2(λj)φi(λj)

) p2q

fY (y) dy

1p

+

∫[0,∞)n

(∑j:λj∈[β,1] yjP

2(λj)φi(λj)(1− λj)2q∑nj=1 yjP

2(λj)φi(λj)

) p2q

fY (y) dy

1p

for q ∈ N, 2p < 2q ≤ 4p, and any β ∈ (0, 1). The second term on the right hand side is bounded above by1− β, and the integrand of the first term is bounded above by(∑

j:λj<βyjP

2(λj)φi(λj)(1− λj)2q∑nj=1 yjP

2(λj)φi(λj)

) p2q

(∑j:λj<β

yjP2(λj)φi(λj)(1− λj)2q∑n

j=1 yjP2(λj)φi(λj)

) 14

≤ maxx∈[0,β]

|P (x)|1/2φ1/4i (x)(1− x)2q−2

|P (λj∗)|1/2φ1/4i (λj∗)

(∑i:λi<β

yi

yj∗

) 14

.

Page 26: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

26 JOHN C. URSCHEL

By replacing the minimizing polynomial in Pm−i by Tm−i

(2

βx− 1

), the maximum is achieved at x = 0,

and, by the monotonicity of Tm−i on [1,∞),

Tm−i

(2

βλj∗ − 1

)≥ Tm−i

(2

β− 1

)≥ 1

2exp

{2√

1− β (m− i)}.

In addition,

φ1/4i (0)

φ1/4i (λj∗)

=

i−1∏k=1

∣∣∣∣∣ λ(m)k

λ(m)k − λj∗

∣∣∣∣∣1/2

≤ δ−(i−1)/2.

We can now bound the p-norm by

Eb∼U(Sn−1)

[(λi(A)− λ(m)

i (A, b))p]1/p

≤[

21/4Γ(1/4)

Γ(1/2)

n1/4

δ(i−1)/2

]1/p

e−γ(m−i)/p + γ2,

where γ =√

1− β. Setting γ =p

m− iln(δ−(i−1)/2pn1/4p(m− i)2

)(assuming γ < 1, otherwise our bound

is already greater than one, and trivially holds), we obtain

Eb∼U(Sn−1)

[(λi(A)− λ(m)

i (A, b))p]1/p

(21/4Γ(1/4)

Γ(1/2)

)1/p

+ 116 ln2

(δ−2(i−1)n(m− i)8p

)(m− i)2

≤ .068ln2(δ−2(i−1)n(m− i)8p

)(m− i)2

,

for m ≥ 9 + i, n ≥ 100. This completes the proof of the expected error estimate. The probabilistic estimatefollows relatively quickly, and is very similar in nature to the proof of Lemma 4.2. We have

λi(A)− λ(m)i (A, b) ≤ min

P∈Pm−i

∑nj=i+1 YjP

2(λj)φi(λj)(1− λj)∑nj=1 YjP

2(λj)φi(λj)

≤ minP∈Pm−i

maxx∈[0,β]

P 2(x)φi(x)(1− x)

P 2(λj∗)φi(λj∗)

∑j:λj<β

Yj

Yj∗+ (1− β)

≤ δ−2(i−1) T−2m−i

(2

β− 1

) ∑j:λj<β

Yj

Yj∗+ (1− β)

≤ 4 δ−2(i−1) exp{−4√

1− β(m− i)}∑j:λj<β

Yj

Yj∗+ (1− β).

By Proposition 3.3,

P

[∑j:λj<β

Yj

Yj∗≥ n3.02

]≤ P

[Yj∗ ≤ n−2.01

]+ P

∑j 6=j∗

Yj ≥ n1.01

(e/n2.01

)1/2+(n.01e1−n.01

)(n−1)/2

= o(1/n).

Let√

1− β =ln(δ−2(i−1)n3.02(m− i)2

)4(m− i)

. Then, with probability 1− o(1/n),

λi(A)− λ(m)i (A) ≤ 4

(m− i)2+

ln2(δ−2(i−1)n3.02(m− i)2

)16(m− i)2

.

The 4/(m− i)2 term is dominated by the log term as n increases, and, therefore, with probability 1−o(1/n),

λi(A)− λ(m)i (A) ≤ .571

ln2(δ−2(i−1)/3n(m− i)2/3

)(m− i)2

.

Page 27: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 27

For typical matrices with no repeated eigenvalues, δ is usually a very low degree polynomial in n, and,for i constant, the estimates for λi are not much worse than that of λ1. In addition, it seems likely that,given any matrix A, a small random perturbation of A before the application of the Lanczos method willsatisfy the condition of the theorem with high probability, and change the eigenvalues by a negligible amount.However, it is possible that, with a different proof, the same (or improved) estimates can be shown withoutany assumptions on eigenvalue gaps. Of course, the bounds from Theorem 6.2 for maximal eigenvalues λialso apply to minimal eigenvalues λn−i.

Next, we make use of the upper and lower bounds of Theorems 1.2 and 1.3 to produce error estimatesfor the condition number of a symmetric positive definite matrix. Let κ(A) = λ1(A)/λn(A) denote the

condition number of a matrix A ∈ Sn++ and κ(m)(A, b) = λ(m)1 (A, b)/λ(m)

m (A, b) denote the condition numberof the tridiagonal matrix Tm ∈ Sm++ resulting from m iterations of the Lanczos method applied to a matrixA ∈ Sn++ with initial vector b. Their difference can be written as

κ(A)− κ(m)(A, b) =λ1(A)

λn(A)− λ

(m)1 (A, b)

λ(m)m (A, b)

=λ1(A)λ

(m)m (A, b)− λn(A)λ

(m)1 (A, b)

λn(A)λ(m)m (A, b)

(m)m (A, b)

(λ1(A)− λ(m)

1 (A, b))

+ λ(m)1 (A, b)

(m)m (A, b)− λn(A)

)λn(A)λ

(m)m (A, b)

= (κ(A)− 1)

[λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)+ κ(m)(A, b)

λ(m)m (A, b)− λn(A)

λ1(A)− λn(A)

],

which leads to the bounds

κ(A)− κ(m)(A, b)

κ(A)≤ λ1(A)− λ(m)

1 (A, b)

λ1(A)− λn(A)+ κ(A)

λ(m)m (A, b)− λn(A)

λ1(A)− λn(A)

andκ(A)− κ(m)(A, b)

κ(m)(A, b)≥ (κ(A)− 1)

λ(m)m (A, b)− λn(A)

λ1(A)− λn(A).

Using Theorems 1.2 and 1.3, and Minkowski’s inequality, we have the following two corollaries.

Corollary 6.3.

supA∈Sn++

κ(A)=κ

Eb∼U(Sn−1)

[κ(A)− κ(m)(A, b)

κ(m)(A, b)

]≥ (1−o(1))(κ−1)×

1 for m = o(lnn)

1− α for m ≤ lnn− ln(ln2 n)

4 ln(

1+√

2√α

) , α ∈ (0, 1]

.249 lnn

m2 ln lnnfor m = O(lnn)

.637

m2for m = o

(n1/2

ln1/2 n

),

and

supA∈Sn++

κ(A)=κ

Eb∼U(Sn−1)

[(κ(A)− κ(m)(A, b)

κ(A)

)p]1/p

≤ .068 (κ+1)ln2(n(m− 1)8p

)(m− 1)2

, for n ≥ 100, m ≥ 10, p ≥ 1.

Corollary 6.4.

supA∈Sn++

κ(A)=κ

Pb∼U(Sn−1)

[κ(A)− κ(m)(A, b)

κ(m)(A, b)≥ (1− o(1))(κ− 1)

]≥ 1− o(1/n)

for m = o(lnn),

supA∈Sn++

κ(A)=κ

Pb∼U(Sn−1)

[κ(A)− κ(m)(A, b)

κ(m)(A, b)≥ (1− o(1))(κ− 1)(1− α)

]≥ 1− o(1/n)

Page 28: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

28 JOHN C. URSCHEL

for m ≤ lnn− ln(ln2 n)

4 ln(

1+√

2√α

) , α ∈ (0, 1],

supA∈Sn++

κ(A)=κ

Pb∼U(Sn−1)

[κ(A)− κ(m)(A, b)

κ(m)(A, b)≥ .249

(κ− 1) lnn

m2 ln lnn

]≥ 1− o(1/n)

for m = O(lnn),

supA∈Sn++

κ(A)=κ

Pb∼U(Sn−1)

[κ(A)− κ(m)(A, b)

κ(m)(A, b)≥ .637

κ− 1

m2

]≥ 1− o(1/n)

for m = o

(n1/2

ln1/2 n

), and

supA∈Sn++

κ(A)=κ

Pb∼U(Sn−1)

[κ(A)− κ(m)(A, b)

κ(A)≤ .571 (κ+ 1)

ln2(n(m− 1)2/3

)(m− 1)2

]≥ 1− o(1/n)

for all m > 1.

As mentioned in Section 1, it is not possible to produce uniform bounds for the relative error of κ(m)(A, b).The Corollaries 6.3 and 6.4 produce the first lower bounds for this error, albeit with a linear dependence onthe value of the condition number itself.

7. Experimental Results

In this section, we present a number of experimental results that illustrate error estimates of the Lanczosmethod in practice. We consider four different spectra:

• eigenvalues of 1D Laplacian with Dirichlet boundary conditions Λlap = {2 + 2 cos(iπ/(n+ 1))}ni=1,• a uniform partition of [0, 1], Λunif = {(n− i)/(n− 1)}ni=1,• eigenvalues from the semi-circle distribution, Λsemi = {λi}ni=1, where

1/2 +(λi

√1− λ2

i + arcsinλi)/π = (n− i)/(n− 1), i = 1, ..., n,

• eigenvalues corresponding to Lemma 3.10, Λlog = {1− [(n+ 1− i)/n]ln lnnlnn }ni=1.

For each one of these spectra, we perform tests for dimensions n = 10i, i = 5, 6, 7, 8. For each dimension,we generate 100 random vectors b ∼ U(Sn−1), and perform m = 100 iterations of the Lanczos method oneach vector. In Figures 1 and 2, we report the results of these experiments. In particular, for each spectrum,

we plot the empirical average relative error (λ1 − λ(m)1 )/(λ1 − λn) for each dimension as m varies from 1 to

100. In addition, for n = 108, we present a box plot for each spectrum, illustrating the variability in relativeerror that each spectrum possesses.

In Figure 1, we note that the plots of Λlap, Λunif , Λsemi all illustrate an empirical average error estimate

that scales with m−2, and has no dependence on dimension n (for n sufficiently large). This is in linewith the theoretical results of the paper, most notably Theorem 5.3, as all of these spectra exhibit suitableconvergence of their empirical spectral distribution, and the related integral minimization problems can allbe shown to have solutions of order m−2. In addition, the box plots corresponding to n = 108 illustrate thatthe relative error for a given iteration number has a relatively small variance compared to the average. Forinstance, all extreme values remain within a range of length less than .01m−2 for Λlap, .1m

−2 for Λunif ,

and .4m−2 for Λsemi. This is also in line with the convergence in probability of Theorem 5.3. The empiricalspectral distribution of all three spectra converge to shifted and scaled versions of Jacobi weight functions,namely, Λlap corresponds to ω−1/2,−1/2(x), Λunif to ω0,0(x), and Λsemi to ω1/2,1/2(x). Over these threespectra, the expected relative error increases with the choice of α in the corresponding weight function. Thisis consistent with the dependence on α in the bounds produced in Lemma 3.6. Two additional unanticipatedproperties which are suggested by Figure 1 are that the variance and the dimension n required to observeasymptotic behavior both appear to increase with α.

Page 29: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 29

(a) Plot, Λlap (b) Box Plot, Λlap

(c) Plot, Λunif (d) Box Plot, Λunif

(e) Plot, Λsemi (f) Box Plot, Λsemi

Figure 1. Plot and box plot of relative error times m2 vs iteration number m for Λlap,Λunif , and Λsemi. The plot contains curves for each dimension n tested. Each curverepresents the empirical average relative error for each value of m, averaged over 100 randominitializations. The box plot illustrates the variability of relative error for n = 108. For agiven m, the 25th and 75th percentile of the values, denoted by q1 and q3, are the bottomand top of the corresponding box, and the red line in the box is the median. The whiskersextend to the most extreme points in the interval [q1 − 1.5(q3 − q1), q3 + 1.5(q3 − q1)], andoutliers not in this interval correspond to the ‘+’ symbol.

Page 30: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

30 JOHN C. URSCHEL

(a) Plot, Λlog (b) Box Plot, Λlog

Figure 2. Plot and box plot of relative error times m2 vs iteration number m for Λlog.The plot contains curves for each dimension n tested. Each curve represents the empiricalaverage relative error for each value of m, averaged over 100 random initializations. The boxplot illustrates the variability of relative error for n = 108. For a given m, the 25th and 75th

percentile of the values, denoted by q1 and q3, are the bottom and top of the correspondingbox, and the red line in the box is the median. The whiskers extend to the most extremepoints in the interval [q1 − 1.5(q3 − q1), q3 + 1.5(q3 − q1)], and outliers not in this intervalcorrespond to the ’+’ symbol.

In Figure 2, we observe the plot and boxplot for Λlog, and note the dependence on dimension n. The plotof relative error for Λlog illustrates that relative error does indeed depend on n, and is increasing with n inwhat appears to be a logarithmic fashion. For instance, when looking at the average relative error scaledby m2, the maximum over all iteration numbers m appears to increase somewhat logarithmically (it is ≈ 4for n = 105, ≈ 5 for n = 106, ≈ 6 for n = 107, and ≈ 7 for n = 108). In addition, the boxplot for n = 108

illustrates that this spectrum exhibits a large degree of variability and is susceptible to extreme outliers.These numerical results support the theoretical lower bounds of Section 3, and illustrate that the asymptotictheoretical lower bounds which depend on n do occur in practical computations.

Acknowledgments

The author would like to thank Alan Edelman, Michel Goemans and Steven Johnson for interestingconversations on the subject, and Louisa Thomas for improving the style of presentation. The author wassupported in part by ONR Research Contract N00014-17-1-2177.

References

[1] Jack Cuzick. A strong law for weighted sums of iid random variables. Journal of Theoretical Probability, 8(3):625–641,1995.

[2] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures

& Algorithms, 22(1):60–65, 2003.[3] Philip J Davis and Philip Rabinowitz. Methods of numerical integration. Courier Corporation, 2007.

[4] Gianna M Del Corso and Giovanni Manzini. On the randomized error of polynomial methods for eigenvector and eigenvalueestimates. Journal of Complexity, 13(4):419–456, 1997.

[5] Kathy Driver, Kerstin Jordaan, and Norbert Mbuyi. Interlacing of the zeros of jacobi polynomials with different parameters.

Numerical Algorithms, 49(1-4):143, 2008.[6] Klaus-Jurgen Forster and Knut Petras. On estimates for the weights in gaussian quadrature in the ultraspherical case.

MATHEMATICS of computation, 55(191):243–264, 1990.

[7] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU press, 2012.[8] Shmuel Kaniel. Estimates for some computational techniques in linear algebra. Mathematics of Computation, 20(95):369–

378, 1966.

[9] Jacek Kuczynski and Henryk Wozniakowski. Estimating the largest eigenvalue by the power and lanczos algorithms witha random start. SIAM journal on matrix analysis and applications, 13(4):1094–1122, 1992.

Page 31: Introductionurschel/preprints/uniform.pdf · 2020-04-12 · :103 ln2(n(m 1)4) (m 1)2; for all n 8 and m 4, where Sn ++ is the set of all n nsymmetric positive de nite matrices, and

UNIFORM ERROR ESTIMATES FOR THE LANCZOS METHOD 31

[10] Jacek Kuczynski and Henryk Wozniakowski. Probabilistic bounds on the extremal eigenvalues and condition number bythe lanczos algorithm. SIAM Journal on Matrix Analysis and Applications, 15(2):672–691, 1994.

[11] Geno Nikolov. Inequalities of duffin-schaeffer type ii. Institute of Mathematics and Informatics at the Bulgarian Academy

of Sciences, 2003.[12] Christopher Conway Paige. The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis,

University of London, 1971.

[13] Beresford N Parlett, H Simon, and LM Stringer. On estimating the largest eigenvalue with the lanczos algorithm. Mathe-matics of computation, 38(157):153–165, 1982.

[14] Yousef Saad. On the rates of convergence of the lanczos and the block-lanczos methods. SIAM Journal on NumericalAnalysis, 17(5):687–706, 1980.

[15] Yousef Saad. Numerical methods for large eigenvalue problems: revised edition, volume 66. Siam, 2011.

[16] Jie Shen, Tao Tang, and Li-Lian Wang. Spectral methods: algorithms, analysis and applications, volume 41. SpringerScience & Business Media, 2011.

[17] Lloyd N Trefethen and David Bau III. Numerical linear algebra, volume 50. Siam, 1997.

[18] Jos LM Van Dorsselaer, Michiel E Hochstenbach, and Henk A Van Der Vorst. Computing probabilistic bounds for extremeeigenvalues of symmetric matrices with the lanczos method. SIAM Journal on Matrix Analysis and Applications, 22(3):837–

852, 2001.

[19] Herbert S Wilf. Finite sections of some classical inequalities, volume 52. Springer Science & Business Media, 2012.

Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.E-mail address: [email protected]